hi Daniel, all,
Jakob Voss informed me that there had been a posting regarding memento
on this list. Since we are very keen on native Memento support for the
mediawiki platform, I felt like responding by giving some perspective
on Memento in general, and on some questions that were raised
regarding its implementation for mediawiki, specifically:
1. As with previous projects we have engaged in (OpenURL, OAI-PMH, OAI-
ORE, SRU/W), Memento is not merely an academic exercise about which we
want to publish papers. Given our status as researchers, we do need
to publish the occasional paper but the real goal of the project is to
make datetime content negotiation for the Web really happen. Doing so
will require a lot more work, at various levels, including:
a. formal specification (we are thinking an Internet Draft => RFC path),
b. promotion,
c. real life implementations,
d. further research.
Currently, we are focusing on (b) and (c) to try and overcome a
chicken and egg situation: we think we propose a nice framework to
integrate archival content seamlessly in regular web navigation, but
in order to demonstrate it we need some adoption. But in order to get
some adoption we need to be able to demonstrate the framework, which
we can't do without adoption. Etc. ;-)
It is in this context that our contact with Jakob Voss, and Daniel's
mail on this list, is really exciting. The ability to demonstrate
Memento at work for mediawiki platform, including the Wikipedia
deployments, would be absolutely fantastic for the Memento cause. That
is why we have immediately engaged in further development of our
initial prototype Memento plug-in for mediawiki, to take into account
remarks made by Jakob, and to make it more robust. The ongoing work is
at
http://www.mediawiki.org/wiki/Extension:Memento .
2. Let me describe the actual status and challenges faced in the
Memento plug-in work:
2.1. The plug-in detects a client's X-Accept-Datetime header, and
returns the mediawiki page that was active at the datetime specified
in the header. Same for images, actually. This effectively allows
navigating (as in clicking links) a mediawiki collection as it existed
in the past: as long as a client issues an X-Accept-Datetime header,
matching history pages/images will be retrieved.
2.2. We are looking into addressing this issue raised by Jakob (and
Daniel): Display history pages with the template that was active at
the time the history page acted as the current one. We definitely
think this would be cool, but we don't think it can be achieved by our
plug-in because templates are included at the server side, i.e. they
are not URI-addressed XSL that are rendered at the client-side. Hence,
one can't do datetime content negotiation on them - they are outside
of the memento realm and rather in the realm of the CMS. So, we are
looking at the mediawiki code to see whether a history page, when
rendered, could itself retrieve the appropriate (old) template from
the database. If we are successful, we will share that code also at
http://www.mediawiki.org/wiki/Extension:Memento
once available. It will obviously be up to the mediawiki community
whether they are willing to adopt the proposed change to the codebase.
2.3. We have looked into another issue raised by Jakob: Display
deleted pages as they existed at the datetime expressed in X-Datetime-
Accept. We have actually implemented this. There are 2 caveats:
- as is the case with mediawiki in general, deleted pages are only
accessible by those with appropriate permissions;
- as is the case with mediawiki in general, deleted pages show up in
Edit mode.
This code will soon be included at
http://www.mediawiki.org/wiki/Extension:Memento
.
2.4. We do not feel that all pages should necessarily be subject to
datetime content negotiation, in the same way that not all URIs are
subject to content negotiation in other dimensions. We feel that the
Special Pages fall under this category, as they do not have History.
2.5. We have ideas regarding how to address the issue raised by
Daniel: the timestamp isn't a unique identifier, multiple revisions
*might* have the
same timestamp. From the perspective of Memento, a datetime is
obviously the only "globally" recognizable value that can be used for
negotiation. If cases occur where multiple versions of a page exist
for the same second, the thing to do according to RFC 2295 would be to
return a "300 Mutliple Choices", listing the URIs (and metadata) of
those version in an Alternates header. The client then has to take it
from there.
2.6. The caching issue is a general problem arising from introducing
Memento in a web that does not (yet) do Memento: when in datetime
content negotiation mode all caches between client and server (both
included) need to be bypassed. As described in our paper, we currently
address this problem by adding the following client headers:
Cache-Control: no-cache => to force cache revalidation, and
If-Modified-Since: Thu, 01 Jan 1970 00:00:00 GMT' to enforce
validation failure
We very much understand this is not elegant but it tends to work ;-) .
This is an area for further research. As the paper states: "Ideally,
a solution should leverage existing caching practice but extend it in
such a way that caches are only bypassed in DT-conneg when essential,
but still used whenever possible (e.g., to deliver Mementos)."
I hope this helps. Please let us know what we can do to increase the
chances of adoption of the Memento solution for the mediawiki
platform. I hope it is clear that we _really_ would like to see this
happen!
Cheers
Herbert Van de Sompel
==
Hi all
The Memento Project <http://www.mementoweb.org/> (including the Los
Alamos
National Laboratory (!) featuring Herbert Van de Sompel of OpenURL
fame) is
proposing a new HTTP header, X-Accept-Datetime, to fetch old versions
of a web
resource. They already wrote a MediaWiki extension for this
<http://www.mediawiki.org/wiki/Extension:Memento> - which would of
course be
particularly interesting for use on Wikipedia.
Do you think we could have this for Wikimedia project? I think that
would be
very nice indeed. I recall that ways to look at last weeks main page
have been
discussed before, and I see several issues:
* the timestamp isn't a unique identifier, multiple revisions *might*
have the
same timestamp. We need a tiebreak (rev_id would be the obvious choice).
* templates and images also need to be "time warped". It seems like the
extension does not address this at the moment. For flagged revisions
we do have
such a machnism, right? Could that be used here?
* Squids would need to know about the new header, and by pass the
cache when
it's used.
so, what do you think? what does it take? Can we point them to the
missing bits?
-- daniel
==
Herbert Van de Sompel
Digital Library Research & Prototyping
Los Alamos National Laboratory, Research Library
http://public.lanl.gov/herbertv/
tel. +1 505 667 1267