I want to echo what Nate said. We've been using this for more than a year within the Wikimedia Foundation, and it has made analyses of editing behavior much, much easier and faster, not to mention a lot less annoying.

This is the product of years of expert work by the Analytics team, and they deserve plenty of congratulations for it 😊

On Mon, 10 Feb 2020 at 10:42, Nate E TeBlunthuis <nathante@uw.edu> wrote:
Thank you so much Joal! I've been happily using this data for some time and I'm optimistic that it can make doing thorough analyses of Wikimedia projects much more accessible to the community, students, and researchers.

-- Nate

From: Wiki-research-l <wiki-research-l-bounces@lists.wikimedia.org> on behalf of Joseph Allemandou <jallemandou@wikimedia.org>
Sent: Monday, February 10, 2020 8:27 AM
To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. <analytics@lists.wikimedia.org>; Research into Wikimedia content and communities <wiki-research-l@lists.wikimedia.org>; Product Analytics <product-analytics@wikimedia.org>
Subject: [Wiki-research-l] Announcement - Mediawiki History Dumps
 
Hi Analytics People,

The Wikimedia Analytics Team is pleased to announce the release of the most
complete dataset we have to date to analyze content and contributors
metadata: Mediawiki History [1] [2].

Data is in TSV format, released monthly around the 3rd of the month
usually, and every new release contains the full history of metadata.

The dataset contains an enhanced [3] and historified [4] version of user,
page and revision metadata and serves as a base to Wiksitats API on edits,
users and pages [5] [6].

We hope you will have as much fun playing with the data as we have building
it, and we're eager to hear from you [7], whether for issues, ideas or
usage of the data.

Analytically yours,

--
Joseph Allemandou (joal) (he / him)
Sr Data Engineer
Wikimedia Foundation

[1] https://dumps.wikimedia.org/other/mediawiki_history/readme.html
[2]
https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Mediawiki_history_dumps
[3] Many pre-computed fields are present in the dataset, from edit-counts
by user and page to reverts and reverted information, as well as time
between events.
[4] As accurate as possible historical usernames and page-titles (as well
as user-groups and blocks) is available in addition to current values, and
are provided in a denormalized way to every event of the dataset.
[5] https://wikitech.wikimedia.org/wiki/Analytics/AQS/Wikistats_2
[6] https://wikimedia.org/api/rest_v1/
[7]
https://phabricator.wikimedia.org/maniphest/task/edit/?title=Mediawiki%20History%20Dumps&projectPHIDs=Analytics-Wikistats,Analytics
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l