Hi Bryan,
Le mar. 8 oct. 2024 à 17:59, Bryan Davis bd808@wikimedia.org a écrit :
Do you run a tool that needs data from Dumps to do its job? I would love to hear some stories about how this data helps folks advance the work of the movement.
Socksfinder¹ uses stub-meta-history to build an index of edits used to find how often multiple accounts edit the same articles and help identify sockpuppets.
Arkbot² uses pages-articles to build various lists of articles that share certain properties to help maintenance projects on the French Wikipedia (list of pages not linked to something…). Some of these lists used to be provided as special pages by MediaWiki and were disabled because of performance concerns, some are too specific to be part of MediaWiki.
In both cases, I'm not 100 % certain dumps are the best approach (I've been thinking about using SQL queries on replicas and some of the available APIs), but it works well enough™ and no other approach was so obviously better (if at all) for me to feel an urgent need to rewrite my tools.
Also in the past I've used Wikidata dumps to explore the limits of some RDF tools, found the limits faster than I thought, and moved on to other hobbies :-)
Best regards,
¹ https://socksfinder.toolforge.org/ (https://github.com/Arkanosis/socksfinder) ² https://fr.wikipedia.org/wiki/Utilisateur:Arkbot (https://github.com/Arkanosis/arkbot-rs)