On Thu, Nov 7, 2013 at 11:37 AM, Jim Hu jimhu@tamu.edu wrote:
Hi Nik,
As I was reading the docs for MWSearch, I considered whether I should switch to CirrusSearch, so it may not be a difficult sell. I'd even volunteer to try to update the documentation if you're willing to help walk me through it.
But to show how clueless I am... I'm not sure how to check the other end, since I'm not clear on what it's trying to do. Here's my undoubtedly deeply flawed understanding of what happens (this reflects that I'm a biologist by training and badly self-taught on wikis and linux/unix/osx).
I'm assuming that the problem is in this first step of the update script
java -cp LuceneSearch.jar org.wikimedia.lsearch.oai.IncrementalUpdater -l $@ \
It's listing a bunch of update items (the ... in my first post). I am guessing that it pulls info on revisions from the mysql database and converts them to some format that gets sent to the indexer, which I assume is part of apache Lucene. From the error, it's failing to pass that through some socket to the indexer. But I don't know how to see a log for activity on that socket.
You have the right idea but by "the other side" I mean a log on the indexer. It is some other java process probably running on the Hexamer host that I saw in the indexer logs. It should have something in the logs. Hopefully.
My similarly uninformed reading about CirrusSearch is that it uses elasticsearch, which in turn uses Lucene. So if the problem is between the incrementalUpdater and Lucene, I might have similar issues with CirrusSearch. But if CirrusSearch gives more informative errors, that would help!! And maybe I should switch anyway, as it sounds like support for MWsearch will go away at some point.
Lucene is a library that can be embedded in Java applications to provide full text searching capabilities (and geospatial search and few other things). Anyway, LuceneSearch is a Mediawiki specific application that provides Lucene's full text search capabilities in a way that the MWSearch extension understands.
Elasticsearch serves the same purpose for CirrusSearch as LuceneSearch serves for MWSearch. We like Elasticsearch because it is general purpose and sees a ton more development than LuceneSearch.
As far as support goes - we haven't done much with LuceneSearch/MWSearch in a while. I work on CirrusSearch every day, as does Chad who seems to have replied while I'm sending this email. Elasticsearch itself has had 44 people submit code to it in the past month. Its a more healthy ecosystem but it might be a pain to switch. CirrusSearch requires a very recent version of Mediawiki, for example.
Nik