Hi guys!
I've installed MWSearch and Lucene Search extensions but I can see that the search engine doesn't understand the morphology of Russian (doesn't recognize word forms). How can I turn the morphological analyzer on? How it's done in Russian Wikipedia?
Cheers, ----- Yury Katkov, WikiVote
I hate to say this after all you went through setting up Lucene Search but it is end of life and not receiving any real support. We're in the process of replacing it with the combination of CirrusSearchhttps://www.mediawiki.org/wiki/Extension:CirrusSearch /Elasticsearch http://www.elasticsearch.org/ which work pretty much the same way the MWSearch/Lucene Search combination does. CirrusSearch has to be smarter than MWSearch because Elasticsearch doesn't have any Mediawiki knowledge but because it links into Mediawiki it can do things like expand templates. I like it but I'm biased.
That aside, it looks like Lucene Search is supposed to read InitializeSettings which is kind of wmf specific thing. You might be able to trick it into doing it by putting a file called InitializeSettings.php in the conf directory with the contents
'wgLanguageCode' => array( 'your $wgDBname' => 'ru', ),
CirrusSearch, if you care to try it, reads the language code from wgLanguageCode.
Nik
On Thu, Jan 30, 2014 at 3:39 PM, Yury Katkov katkov.juriy@gmail.com wrote:
Hi guys!
I've installed MWSearch and Lucene Search extensions but I can see that the search engine doesn't understand the morphology of Russian (doesn't recognize word forms). How can I turn the morphological analyzer on? How it's done in Russian Wikipedia?
Cheers,
Yury Katkov, WikiVote _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Hi! I'll definitely try Cirrus, but still it's interesting to see Lucene working. Besides everynew extension by WMF typically requires very fresh MediaWiki version which can be a burden for 3rd parties.
I tried to add InitializeSettings.php, run ./build and ./lsearchd again. Still no good, when I search the word "банк", I expect Lucene to find also "банков", "банки", "банке", etc., and I can see that these word forms are presented in a file LuceneSearch.jar/uzip://org/apache/lucene/analysis/ru/stemsUnicode.txt and words.Unicode.txt.
Still when I search for "банк", I only get "банк" and the following log:
18409 [pool-2-thread-1] INFO org.wikimedia.lsearch.search.SearchEngine - Using FilterWrapper wrap: {} [] 18414 [pool-2-thread-1] INFO org.wikimedia.lsearch.search.SearchEngine - search wikivote: query=[банк] parsed=[custom(+contents:банк^0.2 relevance ([((P contents:"банк") (P sections:"банк"^0.25))^2.0], (P alttitle:"банк"~20^2.5) (P related:"банк"^12.0)) (P alttitle:"банк"~20))] hit=[0] in 7ms using IndexSearcherMul:1391088160991 18439 [pool-2-thread-1] INFO org.wikimedia.lsearch.spell.Suggest - wikivote for original=[банк] suggest: [банк] using=[] in 18 ms 24262 [pool-2-thread-2] INFO org.wikimedia.lsearch.frontend.HttpHandler - query:/search/wikivote/%D0%B1%D0%B0%D0%BD%D0%BA?namespaces=0%2C1%2C2%2C3%2C4%2C5%2C6%2C7%2C8%2C9%2C10%2C11%2C12%2C13%2C14%2C15%2C90%2C91%2C92%2C93%2C102%2C103%2C106%2C107%2C108%2C109%2C170%2C171&offset=0&limit=20&version=2.1&iwlimit=10&searchall=1 what:search dbname:wikivote term:банк 24263 [pool-2-thread-2] INFO org.wikimedia.lsearch.search.SearchEngine - Using FilterWrapper wrap: {} []
----- Yury Katkov, WikiVote
On Fri, Jan 31, 2014 at 1:02 AM, Nikolas Everett neverett@wikimedia.orgwrote:
I hate to say this after all you went through setting up Lucene Search but it is end of life and not receiving any real support. We're in the process of replacing it with the combination of CirrusSearchhttps://www.mediawiki.org/wiki/Extension:CirrusSearch /Elasticsearch http://www.elasticsearch.org/ which work pretty much the same way the MWSearch/Lucene Search combination does. CirrusSearch has to be smarter than MWSearch because Elasticsearch doesn't have any Mediawiki knowledge but because it links into Mediawiki it can do things like expand templates. I like it but I'm biased.
That aside, it looks like Lucene Search is supposed to read InitializeSettings which is kind of wmf specific thing. You might be able to trick it into doing it by putting a file called InitializeSettings.php in the conf directory with the contents
'wgLanguageCode' => array( 'your $wgDBname' => 'ru', ),
CirrusSearch, if you care to try it, reads the language code from wgLanguageCode.
Nik
On Thu, Jan 30, 2014 at 3:39 PM, Yury Katkov katkov.juriy@gmail.com wrote:
Hi guys!
I've installed MWSearch and Lucene Search extensions but I can see that
the
search engine doesn't understand the morphology of Russian (doesn't recognize word forms). How can I turn the morphological analyzer on? How it's done in Russian Wikipedia?
Cheers,
Yury Katkov, WikiVote _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org