Hi Aidan,
I just wanted to follow up on this mail with the results of my digging
deeper into the subject, even if I'm not entirely sure it will fit your
case.
To summarize, in our case, we were facing two distinct problems:
1. First, ElasticSearch had an incomplete index which we found by e.g.
comparing the number of results from a SPARQL query to the number of
results a search would give us, and
2. Second, specific Wikibase specific CirrusSearch keywords (such as
haswbstatement:) were not bringing up any results.
The reason for the first issue seems to have been in either a version
incompatibility or a misconfiguration of index shards, or possibly both.
In any case, we had upgraded from MW 1.35 to 1.39 (wmde.13) without also
adapting the version of elasticsearch in our docker-compose file.
Once we were running the correct version of elasticsearch, we had to
drop the current index (including "--startOver") and then rebuild it. We
did this using the three following commands:
   docker compose exec mediawiki php
extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php
   docker compose exec mediawiki php
extensions/CirrusSearch/maintenance/ForceSearchIndex.php --skipParse
   docker compose exec mediawiki php
extensions/CirrusSearch/maintenance/ForceSearchIndex.php --skipLinks
--indexOnSkip
Since then, the index has been consistent with the items in wdqs.
The second issue was due to missing configuration. I had to dump the
entire index, the queries, and walk through the source code of the
WikibaseCirrusSearch extension and the foundation's github repositories
to figure out which configuration parameters need to be set to make this
work.
In a nutshell, to make haswbstatement work,
$wgWBRepoSettings['searchIndexProperties'] has to be configured:
$wgWBRepoSettings['searchIndexProperties'] = [ 'P1', 'P19',
'P23', ];
$wgWBRepoSettings['searchIndexTypes'] = [
     'string', 'external-id', 'url', 'wikibase-item',
'wikibase-property',
     'wikibase-lexeme', 'wikibase-form', 'wikibase-sense'
];
These are set for WikiData, but not even documented (as far as I've
seen) on the Help pages of the WikibaseCirrusSearch page.
To make wbstatementquantity: work, further configuration and most
notably a property named "quantity" used as a qualifier on a statement
is required.
Like I said, I fear this might not entirely fit your case, but I still
wanted to share what I found.
Regards,
David
On 05/09/2023 16:29, David Raison wrote:
Hi Aidan,
Sorry to disappoint you, I don't have any advice for you, but I wanted
to chime in nonetheless because we've seen similar behavior on a
private project.
I posted this a while back:
https://www.mediawiki.org/wiki/Topic:Xmowlgla1gs0aue8
And since then some people have reported that it's not only the
prefixes that don't work, but several items are simply not indexed at all.
Unfortunately I didn't have any time yet to dig deeper into this, but
I will let you know once I have and I will be closely monitoring this
thread for further clues.
Regards,
David
On 05/09/2023 16:24, Aidan Hogan wrote:
Hi all,
Running a Docker instance of Wikibase (wikibase:1.36.3-wmde.4), I've
noticed that CirrusSearch has stopped indexing new items. Everything
seems to be fine: the ElasticSearch container is up and running,
searches over legacy content work as expected, but searches over new
content do not yield results. Previously there was no problem.
As mentioned, I've not managed to find anything out of place, so if
there were any quick hints on where to look (in what container, log,
etc.) to debug indexing new items in CirrusSearch/ElasticSearch, I
would greatly appreciate it.
Best,
Aidan
--
*TenTwentyFour S.Ã r.l.*
www.tentwentyfour.lu <https://www.tentwentyfour.lu>
*T*: +352 20 211 1024
*F*: +352 20 211 1023
1 place de l'Hôtel de Ville
4138 Esch-sur-Alzette
_______________________________________________
Wikibase Community User Group mailing list --wikibaseug(a)lists.wikimedia.org
To unsubscribe send an email towikibaseug-leave(a)lists.wikimedia.org
--
*TenTwentyFour S.Ã r.l.*
www.tentwentyfour.lu <https://www.tentwentyfour.lu>
*T*: +352 20 211 1024
*F*: +352 20 211 1023
1 place de l'Hôtel de Ville
4138 Esch-sur-Alzette
------------------------------------------------------------------------
Discover a new way of *managing payroll* online: paperless.lu
<https://www.paperless.lu/>