Hi Analytics People,
The Wikimedia Analytics Team is pleased to announce the release of the most
complete dataset we have to date to analyze content and contributors
metadata: Mediawiki History [1] [2].
Data is in TSV format, released monthly around the 3rd of the month
usually, and every new release contains the full history of metadata.
The dataset contains an enhanced [3] and historified [4] version of user,
page and revision metadata and serves as a base to Wiksitats API on edits,
users and pages [5] [6].
We hope you will have as much fun playing with the data as we have building
it, and we're eager to hear from you [7], whether for issues, ideas or
usage of the data.
Analytically yours,
--
Joseph Allemandou (joal) (he / him)
Sr Data Engineer
Wikimedia Foundation
[1]
https://dumps.wikimedia.org/other/mediawiki_history/readme.html
[2]
https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Mediawiki_hist...
[3] Many pre-computed fields are present in the dataset, from edit-counts
by user and page to reverts and reverted information, as well as time
between events.
[4] As accurate as possible historical usernames and page-titles (as well
as user-groups and blocks) is available in addition to current values, and
are provided in a denormalized way to every event of the dataset.
[5]
https://wikitech.wikimedia.org/wiki/Analytics/AQS/Wikistats_2
[6]
https://wikimedia.org/api/rest_v1/
[7]
https://phabricator.wikimedia.org/maniphest/task/edit/?title=Mediawiki%20His...