Hey,
I've installed a fresh MediaWiki using Mediawiki 1.16.2 and PostgreSQL 8.4.7.
Now i stuck in some trouble with the search function.
Example: extract from the Main Page: "Therapieempfehlungen" (german) If I search "Therapieempfehlungen" I got as result the Main Page. But if I search "Therapie" MediaWiki cannot found this phrase on the Main Page. More examples: Therapiepfade, Chemotherapie. None of these words can be found if I search 'Therapie'.
Is this a problem of the fulltext Search Engine of PostgreSQL or is there a mistake in the configuration?
Thanks
I've installed a fresh MediaWiki using Mediawiki 1.16.2 and PostgreSQL 8.4.7.
...
extract from the Main Page: "Therapieempfehlungen" (german) If I search "Therapieempfehlungen" I got as result the Main Page. But if I search "Therapie" MediaWiki cannot found this phrase
...
Postgres' full text system uses word stemming rather than exact matches. The first thing you'd have to do is ensure that you are using 'german' as the language, so Postgres knows how to split the words. The second problem is that even "therapie" won't work, as it's not part of "Therapieempfehlungen": the German root is "therapieempfehl". It's extremely impractical and expensive to match against every single substring of a word, so full text systems use stemming and other tricks. Here's what it looks like under the hood:
# select to_tsquery('german', 'Therapieempfehlungen'); to_tsquery ------------------- 'therapieempfehl'
# select to_tsquery('german', 'Chemotherapie'); to_tsquery ---------------- 'chemotherapi'
# select to_tsquery('english', 'Therapieempfehlungen'); to_tsquery ------------------------ 'therapieempfehlungen'
# select to_tsquery('english', 'puppeteer'); to_tsquery ------------ 'puppet' (1 row)
As far as making sure your tsearch is using german, you want to change your default config to German for the MediaWiki user, which is usually mwuser. This can be done like so:
ALTER USER mwuser SET default_text_search_config = 'german';
Hey, thanks for your answer.
I think the problem is that my hunspell dictionary is not working correctly with compound words. I've mailed to pgsql-general@postgresql.org and we are working on my problem.
I will inform you when I've got it working.
Jens
2011/2/7 Greg Sabino Mullane greg@endpoint.com:
I've installed a fresh MediaWiki using Mediawiki 1.16.2 and PostgreSQL 8.4.7.
...
extract from the Main Page: "Therapieempfehlungen" (german) If I search "Therapieempfehlungen" I got as result the Main Page. But if I search "Therapie" MediaWiki cannot found this phrase
...
Postgres' full text system uses word stemming rather than exact matches. The first thing you'd have to do is ensure that you are using 'german' as the language, so Postgres knows how to split the words. The second problem is that even "therapie" won't work, as it's not part of "Therapieempfehlungen": the German root is "therapieempfehl". It's extremely impractical and expensive to match against every single substring of a word, so full text systems use stemming and other tricks. Here's what it looks like under the hood:
# select to_tsquery('german', 'Therapieempfehlungen'); to_tsquery
'therapieempfehl'
# select to_tsquery('german', 'Chemotherapie'); to_tsquery
'chemotherapi'
# select to_tsquery('english', 'Therapieempfehlungen'); to_tsquery
'therapieempfehlungen'
# select to_tsquery('english', 'puppeteer'); to_tsquery
'puppet' (1 row)
As far as making sure your tsearch is using german, you want to change your default config to German for the MediaWiki user, which is usually mwuser. This can be done like so:
ALTER USER mwuser SET default_text_search_config = 'german';
-- Greg Sabino Mullane greg@endpoint.com End Point Corporation PGP Key: 0x14964AC8
mediawiki-l@lists.wikimedia.org