Good day everyone,


We've seen a problem on a self-hosted docker-compose based installation regarding the blazegraph or wdqs index and the effect that deletion of items, especially mass-deletion, has on it.


Now, I would have preferred to have a deeper understanding of this before posting to the mailing list, but in this case, the wdqs service really is like a black box to me and on top of that, I think that we're in a situation where, regardless of the understanding of the underlying cause, it won't change the current factual situation.

In a nutshell, what we observe is that at times, and I believe it might be related to mass deletion ("nuking") of items, deleted items will not be removed from the blazegraph index.

The main issue with that, besides simply returning false results, is that some tools use both a SPARQL query, followed by an API request to manipulate data, e.g. wikibaseintegrator.

With items remaining in the blazegraph index, but no longer existing on mediawiki, this of course results in situations such as

{'name': 'wikibase-validator-no-such-entity', 'parameters': ['[[Item:Q342|Q342]]']…

Here, the SPARQL endpoint returned this item, but on the mediawiki instance, it has long been deleted, resulting in an API error.


I unfortunately do not have logs of the requests made to the mediawiki service by the wdqs-updater service back when this mass-deletion happened, so I cannot tell if yes or no, all necessary requests were made. I have traced the steps involved in deleting a single items and seeing it being successfully deleted from the index though:

  1. User: POST title=Item:Q3&action=delete
  2. wdqs-updater -> mediawiki: api.php?action=query&list=recentchanges
  3. mediawiki -> wdqs-updater: { type:log, title:Item:Q3, revid:0, …}
  4. wdqs-updater -> mediawiki: Special:EntityData/Q3.ttl

I don't have full logs for the other case where the index does not get updated correctly, but I do have for instance a log line like

18:31:27.750 [main] INFO  o.w.q.r.t.change.RecentChangesPoller - Got 1 changes, from Q22@20220630163125|6522 to Q22@20220630163125|6522

And querying the SPARQL endpoint at 18:40 the Q22 item would still be returned.

The logs from the wdqs container either to not contain any information relevant to this problem or, alternatively, are so verbose that I'm drowning in a sea of messages, being unable to understand what is going on and even whether there is anything relevant to my problem or not.


My questions are these:

  1. Is a problem like this known?
  2. Is there any way to manually go into the blazegraph index and delete that one record that no longer exists in mediawiki or to make blazegraph purge it by somehow replaying the recentchanges entry, or
  3. Will we need to drop the wdqs volume and recreate the entire index from scratch with a sufficiently large $wgRCMaxAge value?

Thank you,
David Raison

--

TenTwentyFour S.à r.l.
www.tentwentyfour.lu
T: +352 20 211 1024
F: +352 20 211 1023
1 place de l'Hôtel de Ville
4138 Esch-sur-Alzette