Sorry for cross-posting, but I didn't see this on Wikidata mailing lists.
---------- Forwarded message ---------- From: Ariel Glenn WMF ariel@wikimedia.org Date: Mon, Mar 5, 2018 at 12:10 PM Subject: [Wikitech-l] changes coming to large dumps To: Wikimedia developers wikitech-l@lists.wikimedia.org, Wikipedia Xmldatadumps-l Xmldatadumps-l@lists.wikimedia.org
Please forward wherever you think appropriate.
For some time we have provided multiple numbered pages-articles bz2 file for large wikis, as well as a single file with all of the contents combined into one. This is consuming enough time for Wikidata that it is no longer sustainable. For wikis where the sizes of these files to recombine is "too large", we will skip this recombine step. This means that downloader scripts relying on this file will need to check its existence, and if it's not there, fall back to downloading the multiple numbered files.
I expect to get this done and deployed by the March 20th dumps run. You can follow along here: https://phabricator.wikimedia.org/T179059
Thanks!
Ariel _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikidata-tech@lists.wikimedia.org