Hi Giovanni,
The pagelinks table is great for temporal snapshots: you know about links between pages at the time of the query. Parsing the wikitext is needed to provide an historical view of the links :)
Cheers
Joseph

On Tue, Feb 18, 2020 at 12:22 AM Giovanni Luca Ciampaglia <glc3@mail.usf.edu> wrote:
Thank you Joseph; great to hear there is interest in building such a
dataset. You say that the link information would need to be parsed from
wikitext, which is complicated; would the pagelinks table help as an
alternative source of data?

*Giovanni Luca Ciampaglia* ∙ glciampaglia.com
Assistant Professor
Computer Science and Engineering
<https://www.usf.edu/engineering/cse/> ∙ University
of South Florida <https://www.usf.edu/>

*Due to Florida’s broad open records law, email to or from university
employees is public record, available to the public and the media upon
request.*


On Thu, Feb 13, 2020 at 9:27 AM Joseph Allemandou <jallemandou@wikimedia.org>
wrote:

> Hi Giovanni,
> Thank you for your message :)
> You are correct in that there is no information on page-to-page link as of
> today, as well as no information for instance on historical values of
> revisions being redirects for instance.
> We share with you the idea that such information is extremely valuable, and
> we have in mind to be able to extract it at some point.
> The reason for which it has not yet been done is because those pieces
> of information are only available through parsing the wikitext of every
> revision, which is not only resource intensive but also complicated
> technically (templates, version changes etc).
> You can be sure we will send another announcement when we'll release that
> data :)
> Best,
>
> On Tue, Feb 11, 2020 at 10:30 PM Giovanni Luca Ciampaglia <
> glc3@mail.usf.edu>
> wrote:
>
> > Hi Joseph,
> >
> > Thanks a lot for creating and sharing such a valuable resource. I went
> > through the schema and from what I understand there is no information
> about
> > page-to-page links, correct? Are there any resources that would provide
> > such historical data?
> >
> > Best,
> >
> > *Giovanni Luca Ciampaglia* ∙ glciampaglia.com
> > Assistant Professor
> > Computer Science and Engineering
> > <https://www.usf.edu/engineering/cse/> ∙ University
> > of South Florida <https://www.usf.edu/>
> >
> > *Due to Florida’s broad open records law, email to or from university
> > employees is public record, available to the public and the media upon
> > request.*
> >
> >
> > On Mon, Feb 10, 2020 at 11:28 AM Joseph Allemandou <
> > jallemandou@wikimedia.org> wrote:
> >
> > > Hi Analytics People,
> > >
> > > The Wikimedia Analytics Team is pleased to announce the release of the
> > most
> > > complete dataset we have to date to analyze content and contributors
> > > metadata: Mediawiki History [1] [2].
> > >
> > > Data is in TSV format, released monthly around the 3rd of the month
> > > usually, and every new release contains the full history of metadata.
> > >
> > > The dataset contains an enhanced [3] and historified [4] version of
> user,
> > > page and revision metadata and serves as a base to Wiksitats API on
> > edits,
> > > users and pages [5] [6].
> > >
> > > We hope you will have as much fun playing with the data as we have
> > building
> > > it, and we're eager to hear from you [7], whether for issues, ideas or
> > > usage of the data.
> > >
> > > Analytically yours,
> > >
> > > --
> > > Joseph Allemandou (joal) (he / him)
> > > Sr Data Engineer
> > > Wikimedia Foundation
> > >
> > > [1] https://dumps.wikimedia.org/other/mediawiki_history/readme.html
> > > [2]
> > >
> > >
> >
> https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Mediawiki_history_dumps
> > > [3] Many pre-computed fields are present in the dataset, from
> edit-counts
> > > by user and page to reverts and reverted information, as well as time
> > > between events.
> > > [4] As accurate as possible historical usernames and page-titles (as
> well
> > > as user-groups and blocks) is available in addition to current values,
> > and
> > > are provided in a denormalized way to every event of the dataset.
> > > [5] https://wikitech.wikimedia.org/wiki/Analytics/AQS/Wikistats_2
> > > [6] https://wikimedia.org/api/rest_v1/
> > > [7]
> > >
> > >
> >
> https://phabricator.wikimedia.org/maniphest/task/edit/?title=Mediawiki%20History%20Dumps&projectPHIDs=Analytics-Wikistats,Analytics
> > > _______________________________________________
> > > Wiki-research-l mailing list
> > > Wiki-research-l@lists.wikimedia.org
> > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > >
> > _______________________________________________
> > Wiki-research-l mailing list
> > Wiki-research-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
>
>
> --
> Joseph Allemandou (joal) (he / him)
> Sr Data Engineer
> Wikimedia Foundation
> _______________________________________________
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


--
Joseph Allemandou (joal) (he / him)
Sr Data Engineer
Wikimedia Foundation