Wikimedia datasets collection on the Internet Archive has surpassed 1 million items - Xmldatadumps-l

14 Nov 2016


      Dear all,
The Wikimedia Foundation datasets collection on the Internet Archive
[1] has now surpassed 1 million items (and about 50,000 full database
dumps)! This marks a major milestone in our archiving efforts of
Wikimedia's vast amount of data and ensures that the vital content
submitted by volunteers across the moment is preserved. All these
would not have been possible without the help of many people,
including Nemo, Ariel and Emijrp (thanks!).
We started archiving towards the end of 2011 and reached a milestone
of half a million items back in June 2015. [2] We have since moved on
from archiving just the main database dumps to saving research-worthy
data such as the pageviews data and even attempting to keep a copy of
Wikimedia Commons. Today, we are working on making the items on the
Internet Archive more accessible for researchers by working on an
interface for searching old dumps.
Despite this feat, we are in constant need of more help. If you are a
researcher, a programmer or someone with a computer, we need your help
in many tasks! Have a look at WikiTeam's project [3] or Emijrp's
Wikipedia Archive page [4] for more information. If you regularly work
on the Wikimedia database dumps, please provide your input in the
Dumps-Rewrite project [5] and the API interface [6].
As before, here's to the next million!
[1]: https://archive.org/details/wikimediadownloads
[2]: https://groups.google.com/forum/#!msg/wikiteam-discuss/Vj3oonpYphg/h9HE6r3v2...
[3]: https://github.com/WikiTeam/wikiteam
[4]: https://en.wikipedia.org/wiki/User:Emijrp/Wikipedia_Archive
[5]: https://phabricator.wikimedia.org/tag/dumps-rewrite/
[6]: https://phabricator.wikimedia.org/T147177
-- 
Hydriz Scholz