Hi Giovanni, Thank you for your message :) You are correct in that there is no information on page-to-page link as of today, as well as no information for instance on historical values of revisions being redirects for instance. We share with you the idea that such information is extremely valuable, and we have in mind to be able to extract it at some point. The reason for which it has not yet been done is because those pieces of information are only available through parsing the wikitext of every revision, which is not only resource intensive but also complicated technically (templates, version changes etc). You can be sure we will send another announcement when we'll release that data :) Best,
On Tue, Feb 11, 2020 at 10:30 PM Giovanni Luca Ciampaglia glc3@mail.usf.edu wrote:
Hi Joseph,
Thanks a lot for creating and sharing such a valuable resource. I went through the schema and from what I understand there is no information about page-to-page links, correct? Are there any resources that would provide such historical data?
Best,
*Giovanni Luca Ciampaglia* ∙ glciampaglia.com Assistant Professor Computer Science and Engineering https://www.usf.edu/engineering/cse/ ∙ University of South Florida https://www.usf.edu/
*Due to Florida’s broad open records law, email to or from university employees is public record, available to the public and the media upon request.*
On Mon, Feb 10, 2020 at 11:28 AM Joseph Allemandou < jallemandou@wikimedia.org> wrote:
Hi Analytics People,
The Wikimedia Analytics Team is pleased to announce the release of the
most
complete dataset we have to date to analyze content and contributors metadata: Mediawiki History [1] [2].
Data is in TSV format, released monthly around the 3rd of the month usually, and every new release contains the full history of metadata.
The dataset contains an enhanced [3] and historified [4] version of user, page and revision metadata and serves as a base to Wiksitats API on
edits,
users and pages [5] [6].
We hope you will have as much fun playing with the data as we have
building
it, and we're eager to hear from you [7], whether for issues, ideas or usage of the data.
Analytically yours,
-- Joseph Allemandou (joal) (he / him) Sr Data Engineer Wikimedia Foundation
[1] https://dumps.wikimedia.org/other/mediawiki_history/readme.html [2]
https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Mediawiki_hist...
[3] Many pre-computed fields are present in the dataset, from edit-counts by user and page to reverts and reverted information, as well as time between events. [4] As accurate as possible historical usernames and page-titles (as well as user-groups and blocks) is available in addition to current values,
and
are provided in a denormalized way to every event of the dataset. [5] https://wikitech.wikimedia.org/wiki/Analytics/AQS/Wikistats_2 [6] https://wikimedia.org/api/rest_v1/ [7]
https://phabricator.wikimedia.org/maniphest/task/edit/?title=Mediawiki%20His...
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l