Hi Giovanni,
The pagelinks table is great for temporal snapshots: you know about links
between pages at the time of the query. Parsing the wikitext is needed to
provide an historical view of the links :)
Cheers
Joseph
On Tue, Feb 18, 2020 at 12:22 AM Giovanni Luca Ciampaglia <glc3(a)mail.usf.edu>
wrote:
Thank you Joseph; great to hear there is interest in
building such a
dataset. You say that the link information would need to be parsed from
wikitext, which is complicated; would the pagelinks table help as an
alternative source of data?
*Giovanni Luca Ciampaglia* ∙
glciampaglia.com
Assistant Professor
Computer Science and Engineering
<https://www.usf.edu/engineering/cse/> ∙ University
of South Florida <https://www.usf.edu/>
*Due to Florida’s broad open records law, email to or from university
employees is public record, available to the public and the media upon
request.*
On Thu, Feb 13, 2020 at 9:27 AM Joseph Allemandou <
jallemandou(a)wikimedia.org>
wrote:
Hi Giovanni,
Thank you for your message :)
You are correct in that there is no information on page-to-page link as
of
today, as well as no information for instance on
historical values of
revisions being redirects for instance.
We share with you the idea that such information is extremely valuable,
and
we have in mind to be able to extract it at some
point.
The reason for which it has not yet been done is because those pieces
of information are only available through parsing the wikitext of every
revision, which is not only resource intensive but also complicated
technically (templates, version changes etc).
You can be sure we will send another announcement when we'll release that
data :)
Best,
On Tue, Feb 11, 2020 at 10:30 PM Giovanni Luca Ciampaglia <
glc3(a)mail.usf.edu>
wrote:
Hi Joseph,
Thanks a lot for creating and sharing such a valuable resource. I went
through the schema and from what I understand there is no information
about
> page-to-page links, correct? Are there any resources that would provide
> such historical data?
>
> Best,
>
> *Giovanni Luca Ciampaglia* ∙
glciampaglia.com
> Assistant Professor
> Computer Science and Engineering
> <https://www.usf.edu/engineering/cse/> ∙ University
> of South Florida <https://www.usf.edu/>
>
> *Due to Florida’s broad open records law, email to or from university
> employees is public record, available to the public and the media upon
> request.*
>
>
> On Mon, Feb 10, 2020 at 11:28 AM Joseph Allemandou <
> jallemandou(a)wikimedia.org> wrote:
>
> > Hi Analytics People,
> >
> > The Wikimedia Analytics Team is pleased to announce the release of
the
most
> complete dataset we have to date to analyze content and contributors
> metadata: Mediawiki History [1] [2].
>
> Data is in TSV format, released monthly around the 3rd of the month
> usually, and every new release contains the full history of metadata.
>
> The dataset contains an enhanced [3] and historified [4] version of
user,
> > page and revision metadata and serves as a base to Wiksitats API on
> edits,
> > users and pages [5] [6].
> >
> > We hope you will have as much fun playing with the data as we have
> building
> > it, and we're eager to hear from you [7], whether for issues, ideas
or
usage of the data.
Analytically yours,
--
Joseph Allemandou (joal) (he / him)
Sr Data Engineer
Wikimedia Foundation
[1]
https://dumps.wikimedia.org/other/mediawiki_history/readme.html
[2]
https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Mediawiki_his…
> [3]
Many pre-computed fields are present in the dataset, from
edit-counts
> by user and page to reverts and reverted
information, as well as time
> between events.
> [4] As accurate as possible historical usernames and page-titles (as
well
> > as user-groups and blocks) is available in addition to current
values,
https://phabricator.wikimedia.org/maniphest/task/edit/?title=Mediawiki%20Hi…
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
--
Joseph Allemandou (joal) (he / him)
Sr Data Engineer
Wikimedia Foundation
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
--
Joseph Allemandou (joal) (he / him)
Sr Data Engineer
Wikimedia Foundation