Dumps consumers:
Forwarding this note from the xmldatadumps-l@lists.wikimedia.org list about an upcoming change to the number of files generated for some wikis (for example dewiki's stub-articles dumps).
If you use dumps regularly in your tool, please do subscribe to the xmldatadumps-l@lists.wikimedia.org list [0] for future notices like this. The list is generally low volume and high signal with announcements of process change as well as operational issues that may slow dumps generation.
[0]: https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
Bryan
---------- Forwarded message ---------- From: Ariel Glenn WMF ariel@wikimedia.org Date: Thu, May 31, 2018 at 5:36 AM Subject: [Xmldatadumps-l] change to output file numbering of big wikis To: Wikipedia Xmldatadumps-l Xmldatadumps-l@lists.wikimedia.org, Wikimedia developers wikitech-l@lists.wikimedia.org
TL;DR: Scripts that reply on xml files numbered 1 through 4 should be updated to check for 1 through 6.
Explanation:
A number of wikis have stubs and page content files generated 4 parts at a time, with the appropriate number added to the filename. I'm going to be increasing that thi month to 6.
The reason for the increase is that near the end of the run there are usually just a few big wikis taking their time at completing. If they run with 6 processes at once, they'll finish up a bit sooner.
If you have scripts that rely on the number 4, just increase it to 6 and you're done.
This will go into effect for the June 1 run and all runs afterwards.
Thanks!
_______________________________________________ Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l