Dear Rob,
Thanks for your response.
I would like to start by pointing out that the 2013 Phabricator ticket
pertained to a significantly different version of the Memento MediaWiki
extension. It’s 2016 by now, and the current version of the extension was
developed from scratch with a grant from the Andrew Mellon foundation, and
with interaction from MediaWiki developers, both on and off of the
wikitech-l list. As a matter of fact, there are currently even two Memento
extensions: one that only adds HTTP headers used by the Memento protocol
and relies on an external TimeGate, another one that implements all aspects
of the Memento protocol. In our work with the W3C, the latter was used,
bringing all aspects of Memento to the W3C wiki.
Regarding the issues you raised:
(1) Memento supports caching. The protocol uses two registered HTTP
response headers: Link and Memento-Datetime. Just like all HTTP headers,
they are cacheable, both for topic pages (which use Link headers) and oldid
pages (which use both Link and Memento-Datetime headers). TimeGate
responses use the 302 HTTP status code, which have no body and are not
cacheable by default [1]. The Memento protocol works with caches, not
against them. As a matter of fact, prior to being published as RFC7089, the
specification was thoroughly assessed on behalf of the IETF by Mark
Nottingham, an expert on web caching.
(2) Supporting the Memento protocol is scalable. This is exemplified by
means of its adoption by 16 web archives around the world, including the
massive Internet Archive that has exposed Memento TimeGate and TimeMap
end-points for many years. We have never heard any concerns regarding
scalability from any of these web archives. Quite to the contrary, many web
archives use OpenWayback software to replay pages and an effort is ongoing
to build the APIs for the new OpenWayback version around the Memento
protocol [2]. In addition, we have specifically assessed the performance
impact of the MediaWiki extensions for Memento and found it to be
negligible [3].
(3) Memento supports various obvious use cases related to “web time travel”
that involves Wikipedia resources. It’s really hard to assess how big a
user group would be interested in these because we are facing a chicken/egg
situation. Clearly, the Wikipedia editors that were involved in the
Wikipedia RFC about Memento thought that supporting the protocol would be
valuable. Anyhow, I list some use cases and want to emphasize that they can
be supported both for machine clients and for browsers. Browsers currently
do not natively support Memento but extensions are available and it is also
possible to build Memento functionality in web pages using JavaScript.
3.1 Memento TimeMaps are an RFC-specified (read “standard”) approach to
expose a version history for Wikipedia topic pages. The recent W3C Data on
the Web Best Practices [4] recommends the use of Memento TimeMaps for
exposing resource version history. The same document also recommends using
Memento TimeGates for access to temporal resource versions.
3.2 Memento can be used for intra-site time travel, as desired for
Wikipedia by Vernor Vinge [5]. This is important for a variety of reasons.
Historical researchers can easily determine what the state of
(inter-linked) Wikipedia topic pages was at a specific date. Memento can
also be used to avoid spoilers for current TV shows and sports, something
that has been desired by users and discussed at Wikipedia [6, 7].
3.3 Memento can be used for inter-site time travel that involves Wikipedia
pages. A user can set a sticky date in the past and visit pages around that
date across web archives and version control systems that support Memento.
For example, we recently conducted a study of almost 400,000 academic
papers containing URI references and found that more than 3,300 research
papers published between 2003-2012 reference Wikipedia articles, a number
that grows each year. The content of these Wikipedia articles has most
likely changed since the time the research paper was published. Using
Memento, a user can revisit the state of the Wikipedia page as it was at
the time the referencing paper was published by using the paper’s
publication date as the sticky date. As per the above, the user can then
also keep navigating subject to that date, visiting both version pages in
Wikipedia (for internal links) and archived pages in web archives (for
external links). This way time travel is seamless, allowing a user to stay
fixed to a datetime regardless of which web site they visit. Note that
Wikipedia itself uses Memento in this manner to link to archived content in
web archives in case of broken external links, using a Memento library that
we developed for them [8].
3.4 There is a growing interest in the use of historic web content, with
efforts such as Archives Unleashed [9] promoting studies in a variety of
fields, including sociology and history. With Memento, a researcher can
gather content from a variety of sources as they existed around the same
datetime. This data collection process, conducted in a machine-driven or
browser-based manner, may include collecting old versions of Wikipedia
resources. Without Memento support at Wikipedia, these pages will routinely
be harvested from web archives that unfortunately have a very sparse
collection of Wikipedia snapshots. Hence, the collected pages will be
temporally imprecise. If Wikipedia would support Memento, these researchers
would be able to collect the exact version page that was active at the
desired datetime.
As indicated, Memento has been widely adopted by web archives around the
world and provides a standardized approach to access historic web content.
We developed the Memento extensions for MediaWiki (as well as generic
TimeGate software [10]) as a means to bring the same time travel power to
version control systems. We have successfully reached out to the W3C and,
as a result, both their wiki and their specifications now provide an
illustration of Memento’s cross-web time travel power. We have
unsuccessfully reached out to Wikipedia in the past but are trying our luck
again now. We believe that adoption by Wikipedia would be a game changer
when it comes to native support for Memento in browsers and hope that the
Wikipedia community will be willing to help make that happen.
We are very interested in hearing what the next steps would be and how we
could be of help.
Greetings
Shawn Jones
[1]
https://tools.ietf.org/html/rfc7231#page-48
[2]
https://github.com/iipc/openwayback/issues/305
[3]
http://arxiv.org/abs/1406.3876
[4]
https://www.w3.org/TR/dwbp/
[5]
https://phabricator.wikimedia.org/T7877
[6]
https://en.wikipedia.org/wiki/Wikipedia_talk:Spoiler
[7]
https://en.wikipedia.org/wiki/Wikipedia:Spoiler
[8]
https://github.com/mementoweb/py-memento-client
[9]
http://archivesunleashed.com/
[10]
https://github.com/mementoweb/timegate
On Mon, Sep 12, 2016 at 12:58 PM, Rob Lanphier <robla(a)wikimedia.org> wrote:
On Mon, Sep 12, 2016 at 10:57 AM, Shawn Jones
<jones.shawn.m(a)gmail.com>
wrote:
Considering the consensus from the RFC was to
start a pilot of Memento on
English Wikipedia, how do we start that process again?
Hi Shawn,
Thanks for your previous email, with all of the links.
Several of us investigated this in 2013, and we responded back in 2013
when we declined this in Bugzilla in 2013:
<https://phabricator.wikimedia.org/T36778#384480>
As I recall, older versions of this never fully addressed the caching
and infrastructure implications of using HTTP headers. Am I
remembering that correctly? Is there something different about
current versions?
Rob
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
--
Shawn M. Jones
Graduate Research Assistant
Los Alamos National Laboratory
Email: jones.shawn.m(a)gmail.com
Twitter: @shawnmjones
Research Groups:
http://ws-dl.blogspot.com
http://www.lanl.gov/library/about/research-prototyping.php