I'm trying to set up a more regular search index update for the Wikimedia sites. To summarize how it's to work:
A process on maurus (the search build master) runs through the list of all wikis, dumping their text and piping it to the search index builder program.
As each wiki completes, the newly built index is moved from the build directory into the complete directory.
The lucene search servers currently restart themselves hourly as a precaution against memory leaks; additionally before restart they will now do an rsync to copy over any complete new indexes from the master. [There may be some refinements to make on this, such as keeping 'live' and 'update' copies and swapping them out during the restart.]
This should keep search index updates happening within a day or two, rather than the extremely long and irregular schedule of before.
Currently the build process is running on maurus since last night; it's currently about 1/3 through enwiki and doesn't appear to have spewed any errors. I'll check in on it again this evening and if things look ok I'll set up the synchronization process.
-- brion vibber (brion @ pobox.com)
Brion Vibber wrote:
Currently the build process is running on maurus since last night; it's currently about 1/3 through enwiki and doesn't appear to have spewed any errors. I'll check in on it again this evening and if things look ok I'll set up the synchronization process.
I had to downgrade IKVM (Java-.net adaptor used for reading the XML dump into the search indexer) because some problems seem to have slipped into Classpath's XML parser, and wikis containing 4-byte UTF8 chars were failing.
For reference; IKVM 0.22 works, 0.30 doesn't. Somewhere in the middle is probably the cutoff...
Started the test build again, I'll look in on it in the morning...
-- brion vibber (brion @ pobox.com)
Brion Vibber wrote:
Started the test build again, I'll look in on it in the morning...
Not seeing any error or exceptions in the log this time, so I've set up the sync job. With luck ;) German wikipedia should be updated this evening, and English Wikipedia sometime in the next couple days.
-- brion vibber (brion @ pobox.com)
On 9/2/06, Brion Vibber brion@pobox.com wrote:
This should keep search index updates happening within a day or two, rather than the extremely long and irregular schedule of before.
I'm very pleased to hear this.
Steve
wikitech-l@lists.wikimedia.org