Thanks for the replies, Nik and Chad. Sounds like I should switch.
Is 1.21.2 recent enough? I'm going to try this on a development server.
On Nov 7, 2013, at 10:50 AM, Nikolas Everett wrote:
On Thu, Nov 7, 2013 at 11:37 AM, Jim Hu
<jimhu(a)tamu.edu> wrote:
Hi Nik,
As I was reading the docs for MWSearch, I considered whether I should
switch to CirrusSearch, so it may not be a difficult sell. I'd even
volunteer to try to update the documentation if you're willing to help walk
me through it.
But to show how clueless I am... I'm not sure how to check the other end,
since I'm not clear on what it's trying to do. Here's my undoubtedly deeply
flawed understanding of what happens (this reflects that I'm a biologist by
training and badly self-taught on wikis and linux/unix/osx).
I'm assuming that the problem is in this first step of the update script
java -cp LuceneSearch.jar org.wikimedia.lsearch.oai.IncrementalUpdater -l
$@ \
It's listing a bunch of update items (the ... in my first post). I am
guessing that it pulls info on revisions from the mysql database and
converts them to some format that gets sent to the indexer, which I assume
is part of apache Lucene. From the error, it's failing to pass that
through some socket to the indexer. But I don't know how to see a log for
activity on that socket.
You have the right idea but by "the other side" I mean a log on the
indexer. It is some other java process probably running on the Hexamer
host that I saw in the indexer logs. It should have something in the
logs. Hopefully.
My similarly uninformed reading about
CirrusSearch is that it uses
elasticsearch, which in turn uses Lucene. So if the problem is between the
incrementalUpdater and Lucene, I might have similar issues with
CirrusSearch. But if CirrusSearch gives more informative errors, that
would help!! And maybe I should switch anyway, as it sounds like support
for MWsearch will go away at some point.
Lucene is a library that can be embedded in Java applications to provide
full text searching capabilities (and geospatial search and few other
things). Anyway, LuceneSearch is a Mediawiki specific application that
provides Lucene's full text search capabilities in a way that the MWSearch
extension understands.
Elasticsearch serves the same purpose for CirrusSearch as LuceneSearch
serves for MWSearch. We like Elasticsearch because it is general purpose
and sees a ton more development than LuceneSearch.
As far as support goes - we haven't done much with LuceneSearch/MWSearch in
a while. I work on CirrusSearch every day, as does Chad who seems to have
replied while I'm sending this email. Elasticsearch itself has had 44
people submit code to it in the past month. Its a more healthy ecosystem
but it might be a pain to switch. CirrusSearch requires a very recent
version of Mediawiki, for example.
Nik
_______________________________________________
MediaWiki-l mailing list
MediaWiki-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
=====================================
Jim Hu
Professor
Dept. of Biochemistry and Biophysics
2128 TAMU
Texas A&M Univ.
College Station, TX 77843-2128
979-862-4054