New subject: memento: time warp for mediawiki

13 Nov 2009


      hi Daniel, all,
Jakob Voss informed me that there had been a posting regarding memento  
on this list. Since we are very keen on native Memento support for the  
mediawiki platform, I felt like responding by giving some perspective  
on Memento in general, and on some questions that were raised  
regarding its implementation for mediawiki, specifically:
1. As with previous projects we have engaged in (OpenURL, OAI-PMH, OAI- 
ORE, SRU/W), Memento is not merely an academic exercise about which we  
want to publish papers.  Given our status as researchers, we do need  
to publish the occasional paper but the real goal of the project is to  
make datetime content negotiation for the Web really happen. Doing so  
will require a lot more work, at various levels, including:
a. formal specification (we are thinking an Internet Draft => RFC path),
b. promotion,
c. real life implementations,
d. further research.
Currently, we are focusing on (b) and (c) to try and overcome a  
chicken and egg situation: we think we propose a nice framework to  
integrate archival content seamlessly in regular web navigation, but  
in order to demonstrate it we need some adoption. But in order to get  
some adoption we need to be able to demonstrate the framework, which  
we can't do without adoption. Etc. ;-)
It is in this context that our contact with Jakob Voss, and Daniel's  
mail on this list, is really exciting. The ability to demonstrate  
Memento at work for mediawiki platform, including the Wikipedia  
deployments, would be absolutely fantastic for the Memento cause. That  
is why we have immediately engaged in further development of our  
initial prototype Memento plug-in for mediawiki, to take into account  
remarks made by Jakob, and to make it more robust. The ongoing work is  
at http://www.mediawiki.org/wiki/Extension:Memento .
2. Let me describe the actual status and challenges faced in the  
Memento plug-in work:
2.1. The plug-in detects a client's X-Accept-Datetime header, and  
returns the mediawiki page that was active at the datetime specified  
in the header. Same for images, actually. This effectively allows  
navigating (as in clicking links) a mediawiki collection as it existed  
in the past: as long as a client issues an X-Accept-Datetime header,  
matching history pages/images will be retrieved.
2.2. We are looking into addressing this issue raised by Jakob (and  
Daniel): Display history pages with the template that was active at  
the time the history page acted as the current one. We definitely  
think this would be cool, but we don't think it can be achieved by our  
plug-in because templates are included at the server side, i.e. they  
are not URI-addressed XSL that are rendered at the client-side. Hence,  
one can't do datetime content negotiation on them - they are outside  
of the memento realm and rather in the realm of the CMS. So, we are  
looking at the mediawiki code to see whether a history page, when  
rendered, could itself retrieve the appropriate (old) template from  
the database. If we are successful, we will share that code also at http://www.mediawiki.org/wiki/Extension:Memento 
  once available. It will obviously be up to the mediawiki community  
whether they are willing to adopt the proposed change to the codebase.
2.3. We have looked into another issue raised by Jakob: Display  
deleted pages as they existed at the datetime expressed in X-Datetime- 
Accept. We have actually implemented this. There are 2 caveats:
- as is the case with mediawiki in general, deleted pages are only  
accessible by those with appropriate permissions;
- as is the case with mediawiki in general, deleted pages show up in  
Edit mode.
This code will soon be included at http://www.mediawiki.org/wiki/Extension:Memento 
.
2.4. We do not feel that all pages should necessarily be subject to  
datetime content negotiation, in the same way that not all URIs are  
subject to content negotiation in other dimensions. We feel that the  
Special Pages fall under this category, as they do not have History.
2.5. We have ideas regarding how to address the issue raised by  
Daniel: the timestamp isn't a unique identifier, multiple revisions  
*might* have the
same timestamp. From the perspective of Memento, a datetime is  
obviously the only "globally" recognizable value that can be used for  
negotiation. If cases occur where multiple versions of a page exist  
for the same second, the thing to do according to RFC 2295 would be to  
return a "300 Mutliple Choices", listing the URIs (and metadata) of  
those version in an Alternates header.  The client then has to take it  
from there.
2.6. The caching issue is a general problem arising from introducing  
Memento in a web that does not (yet) do Memento: when in datetime  
content negotiation mode all caches between client and server (both  
included) need to be bypassed. As described in our paper, we currently  
address this problem by adding the following client headers:
Cache-Control: no-cache => to force cache revalidation, and
If-Modified-Since: Thu, 01 Jan 1970 00:00:00 GMT' to enforce  
validation failure
We very much understand this is not elegant but it tends to work ;-) .  
This is an area for further research. As the paper states:  "Ideally,  
a solution should leverage existing caching practice but extend it in  
such a way that caches are only bypassed in DT-conneg when essential,  
but still used whenever possible (e.g., to deliver Mementos)."
I hope this helps. Please let us know what we can do to increase the  
chances of adoption of the Memento solution for the mediawiki  
platform. I hope it is clear that we _really_ would like to see this  
happen!
Cheers
Herbert Van de Sompel
==
Hi all
The Memento Project http://www.mementoweb.org/ (including the Los  
Alamos
National Laboratory (!) featuring Herbert Van de Sompel of OpenURL  
fame) is
proposing a new HTTP header, X-Accept-Datetime, to fetch old versions  
of a web
resource. They already wrote a MediaWiki extension for this
http://www.mediawiki.org/wiki/Extension:Memento - which would of  
course be
particularly interesting for use on Wikipedia.
Do you think we could have this for Wikimedia project? I think that  
would be
very nice indeed. I recall that ways to look at last weeks main page  
have been
discussed before, and I see several issues:
* the timestamp isn't a unique identifier, multiple revisions *might*  
have the
same timestamp. We need a tiebreak (rev_id would be the obvious choice).
* templates and images also need to be "time warped". It seems like the
extension does not address this at the moment. For flagged revisions  
we do have
such a machnism, right? Could that be used here?
* Squids would need to know about the new header, and by pass the  
cache when
it's used.
so, what do you think? what does it take? Can we point them to the  
missing bits?
-- daniel
==
Herbert Van de Sompel
Digital Library Research & Prototyping
Los Alamos National Laboratory, Research Library
http://public.lanl.gov/herbertv/
tel. +1 505 667 1267

Re: [Wikitech-l] memento: time warp for mediawiki