Hi Aidan,


I just wanted to follow up on this mail with the results of my digging deeper into the subject, even if I'm not entirely sure it will fit your case.

To summarize, in our case, we were facing two distinct problems:

  1. First, ElasticSearch had an incomplete index which we found by e.g. comparing the number of results from a SPARQL query to the number of results a search would give us, and
  2. Second, specific Wikibase specific CirrusSearch keywords (such as haswbstatement:) were not bringing up any results.

The reason for the first issue seems to have been in either a version incompatibility or a misconfiguration of index shards, or possibly both.

In any case, we had upgraded from MW 1.35 to 1.39 (wmde.13) without also adapting the version of elasticsearch in our docker-compose file.

Once we were running the correct version of elasticsearch, we had to drop the current index (including "--startOver") and then rebuild it. We did this using the three following commands:

    docker compose exec mediawiki php extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php
    docker compose exec mediawiki php extensions/CirrusSearch/maintenance/ForceSearchIndex.php --skipParse
    docker compose exec mediawiki php extensions/CirrusSearch/maintenance/ForceSearchIndex.php --skipLinks --indexOnSkip

Since then, the index has been consistent with the items in wdqs.


The second issue was due to missing configuration. I had to dump the entire index, the queries, and walk through the source code of the WikibaseCirrusSearch extension and the foundation's github repositories to figure out which configuration parameters need to be set to make this work.

In a nutshell, to make haswbstatement work, $wgWBRepoSettings['searchIndexProperties'] has to be configured:

$wgWBRepoSettings['searchIndexProperties'] = [ 'P1', 'P19', 'P23', ];
$wgWBRepoSettings['searchIndexTypes'] = [
       'string', 'external-id', 'url', 'wikibase-item', 'wikibase-property',
       'wikibase-lexeme', 'wikibase-form', 'wikibase-sense'
];

These are set for WikiData, but not even documented (as far as I've seen) on the Help pages of the WikibaseCirrusSearch page.

To make wbstatementquantity: work, further configuration and most notably a property named "quantity" used as a qualifier on a statement is required.

Like I said, I fear this might not entirely fit your case, but I still wanted to share what I found.


Regards,
David


On 05/09/2023 16:29, David Raison wrote:

Hi Aidan,


Sorry to disappoint you, I don't have any advice for you, but I wanted to chime in nonetheless because we've seen similar behavior on a private project.

I posted this a while back: https://www.mediawiki.org/wiki/Topic:Xmowlgla1gs0aue8

And since then some people have reported that it's not only the prefixes that don't work, but several items are simply not indexed at all.

Unfortunately I didn't have any time yet to dig deeper into this, but I will let you know once I have and I will be closely monitoring this thread for further clues.


Regards,
David


On 05/09/2023 16:24, Aidan Hogan wrote:
Hi all,

Running a Docker instance of Wikibase (wikibase:1.36.3-wmde.4), I've noticed that CirrusSearch has stopped indexing new items. Everything seems to be fine: the ElasticSearch container is up and running, searches over legacy content work as expected, but searches over new content do not yield results. Previously there was no problem.

As mentioned, I've not managed to find anything out of place, so if there were any quick hints on where to look (in what container, log, etc.) to debug indexing new items in CirrusSearch/ElasticSearch, I would greatly appreciate it.

Best,
Aidan

--

TenTwentyFour S.à r.l.
www.tentwentyfour.lu
T: +352 20 211 1024
F: +352 20 211 1023
1 place de l'Hôtel de Ville
4138 Esch-sur-Alzette


_______________________________________________
Wikibase Community User Group mailing list -- wikibaseug@lists.wikimedia.org
To unsubscribe send an email to wikibaseug-leave@lists.wikimedia.org
--

TenTwentyFour S.à r.l.
www.tentwentyfour.lu
T: +352 20 211 1024
F: +352 20 211 1023
1 place de l'Hôtel de Ville
4138 Esch-sur-Alzette


Discover a new way of managing payroll online: paperless.lu