Hi all,
We're having terrible trouble with the Cirrus search maintenance script for initialising the elastic indexes: forceSearchIndex.php --skipLinks --indexOnSkip...
It's happening with MW 1.31 .. 1.33, we're using redis job queue and a single instance of Elastic on the same host (these are low traffic wikis). Debian 10.2, PHP 7.3.
No matter what parameters we use (--queue or not, different --maxJobs, or --fromId/--toId, --batchSize etc etc) we're always finding that hundreds of elastic docs are not being created.
There's nothing about the articles themselves that are preventing it, if we run the maintenance script on just a single missing one afterwards it gets created no problem, and also each time this problem happens, there are many differences in the missing docs.
Please if anyone has heard of this kind of things and could point us in the right direction here that would be awesome!
Thanks a lot, Aran
On Sat, Nov 16, 2019 at 6:58 PM Aran via Wikitech-l < wikitech-l@lists.wikimedia.org> wrote:
Hi all, [...] Please if anyone has heard of this kind of things and could point us in the right direction here that would be awesome!
Hi, no, I've never encountered such random scenario. If inspecting the various logs (mediawiki and elasticsearch) did not provide any clues I would suggest adding debug log messages to the DataSender::sendData method (includes/DataSender.php). This is the last method called from mediawiki before reaching elasticsearch. If you find something interesting or something you think is broken please file a task to http://phabricator.wikimedia.org/ under the tag CirrusSearch.
David.
On Wed, Nov 27, 2019 at 7:05 PM David Causse dcausse@wikimedia.org wrote:
On Sat, Nov 16, 2019 at 6:58 PM Aran via Wikitech-l < wikitech-l@lists.wikimedia.org> wrote:
Hi all, [...] Please if anyone has heard of this kind of things and could point us in the right direction here that would be awesome!
Hi, no, I've never encountered such random scenario. If inspecting the various logs (mediawiki and elasticsearch) did not provide any clues I would suggest adding debug log messages to the DataSender::sendData method (includes/DataSender.php). This is the last method called from mediawiki before reaching elasticsearch. If you find something interesting or something you think is broken please file a task to http://phabricator.wikimedia.org/ under the tag CirrusSearch.
I forgot to mention that we host office hours every first Wednesday of the month, this might be a good opportunity to discuss this : Details for our next meeting:
Date: Wednesday, Dec 6th, 2019
Time: 16:00-17:00 GMT / 08:00-09:00 PST / 11:00-12:00 EST / 17:00-18:00 CET Etherpad: https://etherpad.wikimedia.org/p/Search_Platform_Office_Hours Google Meet link: https://meet.google.com/vyc-jvgq-dww
David copied my typo on the date—just to be clear, our office hours will be this Wednesday, which is the *4th.*
I forgot to mention that we host office hours every first Wednesday of the month, this might be a good opportunity to discuss this :
Details for our next meeting:
Date: Wednesday, *Dec 4th,* 2019
Time: 16:00-17:00 GMT / 08:00-09:00 PST / 11:00-12:00 EST / 17:00-18:00 CET Etherpad: https://etherpad.wikimedia.org/p/Search_Platform_Office_Hours Google Meet link: https://meet.google.com/vyc-jvgq-dww
Apologies to David and Aran.
On Thu, Nov 28, 2019 at 3:47 AM David Causse dcausse@wikimedia.org wrote:
On Wed, Nov 27, 2019 at 7:05 PM David Causse dcausse@wikimedia.org wrote:
On Sat, Nov 16, 2019 at 6:58 PM Aran via Wikitech-l < wikitech-l@lists.wikimedia.org> wrote:
Hi all, [...] Please if anyone has heard of this kind of things and could point us in the right direction here that would be awesome!
Hi, no, I've never encountered such random scenario. If inspecting the
various
logs (mediawiki and elasticsearch) did not provide any clues I would suggest adding debug log messages to the DataSender::sendData method (includes/DataSender.php). This is the last method called from mediawiki before reaching elasticsearch. If you find something interesting or something you think is broken please file a task to http://phabricator.wikimedia.org/ under the tag CirrusSearch.
I forgot to mention that we host office hours every first Wednesday of the month, this might be a good opportunity to discuss this : Details for our next meeting:
Date: Wednesday, Dec 6th, 2019
Time: 16:00-17:00 GMT / 08:00-09:00 PST / 11:00-12:00 EST / 17:00-18:00 CET Etherpad: https://etherpad.wikimedia.org/p/Search_Platform_Office_Hours Google Meet link: https://meet.google.com/vyc-jvgq-dww _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org