Three questions:
1) assume a page P with a Template T.
P has been modified at time T2 and T4. T has been modified at T1 and T3.
Will P be available as of T2 and T4 only, or also as of T3? (at which point it will be different than at T2 or T4).
2) What about changes to Wikidata, Commons, or UI message strings?
3) Possibly interesting to look into TimeMachine, Memento, and related work
https://www.mediawiki.org/wiki/Extension:TimeMachine https://www.mediawiki.org/wiki/Extension:Memento
On Fri, Sep 11, 2020 at 2:59 PM Tiziano Piccardi tiziano.piccardi@epfl.ch wrote:
Thanks Federico and WSC for the interest!
I want to specify that we used only public data released in the XML dump. As WSC said, deleted content is not always permanently removed from the database, but it is available only to users with privilege access. Our goal is not only to release the dataset, but also to give anyone the possibility to (1) reproduce the results, and (2) generate the HTML history in other languages without any special access requirements.
Tiziano
On Fri, Sep 11, 2020 at 9:47 PM WereSpielChequers < werespielchequers@gmail.com> wrote:
I wouldn't use the phrase "Wikipedia’s deliberate policy of permanently deleting the entire history of deleted pages". Quite a few "deleted" pages do actually get restored, and depending on the deletion process it can be quite easy
to
get much deleted content back. Especially if someone volunteers to reference an unreferenced page or a budding footballer actually gets to play at professional or international level, or indeed a political candidate is elected. Almost all "deleted" content still exists and could be restored by a volunteer admin in the right circumstances. However Wikipedia's deletion processes are more than a little complex, many articles have incomplete histories because admins have revision deleted particular revisions that include copyright violations and or some really libellous stuff. Some of the really nasty stuff gets "oversighted" -
those
revisions are not even visible to administrators.
There is also the issue that some of the earliest material is not available. stats on admin actions only go back to December 2004, and
while
there is some content from before then, I am not sure if all the stuff deleted before then is available.
Regards
WSC
On Fri, 11 Sep 2020 at 10:22, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Robert West, 11/09/20 11:29:
local instances of MediaWiki, enhanced with the capacity of correct historical macro expansion.
Interesting. I see this doesn't include deleted templates. Have you considered using historical dumps?
«We emphasize that the limitation of deleted pages, tem- plates, and modules is not introduced by our parsing process. Rather, it is inherited from Wikipedia’s deliberate policy of permanently deleting
the
entire history of deleted pages.»
A relevant task is https://phabricator.wikimedia.org/T2851
See also the various discussions about Memento, like https://phabricator.wikimedia.org/T164654
Federico
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l