Hi,
I wonder if there is any guidance about how to poll the recent changes feed of a MediaWiki instance (in particular of a Wikibase one) to keep up with its stream of edits? In particular, how to do this responsibly (without hammering the server) and how to ensure that all changes are seen by the consumer?
EditGroups (https://tools.wmflabs.org/editgroups/) currently uses the WMF Event Stream to do this, which works well but has the downside of not being available for non-WMF wikis, and the lack of server-side filtering support, so I have been looking into implementing recent changes polling in it, so it can be run on other wikis.
So far it looks like my RC polling strategy misses some edits that the WMF Event Stream includes, so I need to improve this. RC polling is implemented in the WDQS updater here:
https://github.com/wikimedia/wikidata-query-rdf/blob/master/tools/src/main/j...
Is this the best implementation to look at?
And actually - is this really worth doing? Perhaps I should instead require that the target Wikibase runs the EventLogging extension (https://www.mediawiki.org/wiki/Extension:EventLogging) which exposes the edit stream in a Kafka instance, and then implement a Kafka topic consumer in EditGroups. It does add requirements on the Wikibase instance, but if RC polling is brittle, it would be wrong to promise that EditGroups can be run off a stock MediaWiki instance anyway.
(Note that I still think EditGroups is not a long-term solution. We need a MediaWiki extension to replace it: https://phabricator.wikimedia.org/T203557. I am just looking into this to help our OpenRefine GSoC intern Lu Liu who will be working on Wikibase support in OpenRefine this summer.)
Cheers,
Antonin
wikidata-tech@lists.wikimedia.org