Dear all,
The Wikimedia Foundation datasets collection on the Internet Archive [1] has now surpassed 1 million items (and about 50,000 full database dumps)! This marks a major milestone in our archiving efforts of Wikimedia's vast amount of data and ensures that the vital content submitted by volunteers across the moment is preserved. All these would not have been possible without the help of many people, including Nemo, Ariel and Emijrp (thanks!).
We started archiving towards the end of 2011 and reached a milestone of half a million items back in June 2015. [2] We have since moved on from archiving just the main database dumps to saving research-worthy data such as the pageviews data and even attempting to keep a copy of Wikimedia Commons. Today, we are working on making the items on the Internet Archive more accessible for researchers by working on an interface for searching old dumps.
Despite this feat, we are in constant need of more help. If you are a researcher, a programmer or someone with a computer, we need your help in many tasks! Have a look at WikiTeam's project [3] or Emijrp's Wikipedia Archive page [4] for more information. If you regularly work on the Wikimedia database dumps, please provide your input in the Dumps-Rewrite project [5] and the API interface [6].
As before, here's to the next million!
[1]: https://archive.org/details/wikimediadownloads [2]: https://groups.google.com/forum/#!msg/wikiteam-discuss/Vj3oonpYphg/h9HE6r3v2... [3]: https://github.com/WikiTeam/wikiteam [4]: https://en.wikipedia.org/wiki/User:Emijrp/Wikipedia_Archive [5]: https://phabricator.wikimedia.org/tag/dumps-rewrite/ [6]: https://phabricator.wikimedia.org/T147177
xmldatadumps-l@lists.wikimedia.org