hi Daniel, all,
Jakob Voss informed me that there had been a posting regarding memento on this list. Since we are very keen on native Memento support for the mediawiki platform, I felt like responding by giving some perspective on Memento in general, and on some questions that were raised regarding its implementation for mediawiki, specifically:
1. As with previous projects we have engaged in (OpenURL, OAI-PMH, OAI- ORE, SRU/W), Memento is not merely an academic exercise about which we want to publish papers. Given our status as researchers, we do need to publish the occasional paper but the real goal of the project is to make datetime content negotiation for the Web really happen. Doing so will require a lot more work, at various levels, including:
a. formal specification (we are thinking an Internet Draft => RFC path), b. promotion, c. real life implementations, d. further research.
Currently, we are focusing on (b) and (c) to try and overcome a chicken and egg situation: we think we propose a nice framework to integrate archival content seamlessly in regular web navigation, but in order to demonstrate it we need some adoption. But in order to get some adoption we need to be able to demonstrate the framework, which we can't do without adoption. Etc. ;-)
It is in this context that our contact with Jakob Voss, and Daniel's mail on this list, is really exciting. The ability to demonstrate Memento at work for mediawiki platform, including the Wikipedia deployments, would be absolutely fantastic for the Memento cause. That is why we have immediately engaged in further development of our initial prototype Memento plug-in for mediawiki, to take into account remarks made by Jakob, and to make it more robust. The ongoing work is at http://www.mediawiki.org/wiki/Extension:Memento .
2. Let me describe the actual status and challenges faced in the Memento plug-in work:
2.1. The plug-in detects a client's X-Accept-Datetime header, and returns the mediawiki page that was active at the datetime specified in the header. Same for images, actually. This effectively allows navigating (as in clicking links) a mediawiki collection as it existed in the past: as long as a client issues an X-Accept-Datetime header, matching history pages/images will be retrieved.
2.2. We are looking into addressing this issue raised by Jakob (and Daniel): Display history pages with the template that was active at the time the history page acted as the current one. We definitely think this would be cool, but we don't think it can be achieved by our plug-in because templates are included at the server side, i.e. they are not URI-addressed XSL that are rendered at the client-side. Hence, one can't do datetime content negotiation on them - they are outside of the memento realm and rather in the realm of the CMS. So, we are looking at the mediawiki code to see whether a history page, when rendered, could itself retrieve the appropriate (old) template from the database. If we are successful, we will share that code also at http://www.mediawiki.org/wiki/Extension:Memento once available. It will obviously be up to the mediawiki community whether they are willing to adopt the proposed change to the codebase.
2.3. We have looked into another issue raised by Jakob: Display deleted pages as they existed at the datetime expressed in X-Datetime- Accept. We have actually implemented this. There are 2 caveats: - as is the case with mediawiki in general, deleted pages are only accessible by those with appropriate permissions; - as is the case with mediawiki in general, deleted pages show up in Edit mode. This code will soon be included at http://www.mediawiki.org/wiki/Extension:Memento .
2.4. We do not feel that all pages should necessarily be subject to datetime content negotiation, in the same way that not all URIs are subject to content negotiation in other dimensions. We feel that the Special Pages fall under this category, as they do not have History.
2.5. We have ideas regarding how to address the issue raised by Daniel: the timestamp isn't a unique identifier, multiple revisions *might* have the same timestamp. From the perspective of Memento, a datetime is obviously the only "globally" recognizable value that can be used for negotiation. If cases occur where multiple versions of a page exist for the same second, the thing to do according to RFC 2295 would be to return a "300 Mutliple Choices", listing the URIs (and metadata) of those version in an Alternates header. The client then has to take it from there.
2.6. The caching issue is a general problem arising from introducing Memento in a web that does not (yet) do Memento: when in datetime content negotiation mode all caches between client and server (both included) need to be bypassed. As described in our paper, we currently address this problem by adding the following client headers:
Cache-Control: no-cache => to force cache revalidation, and If-Modified-Since: Thu, 01 Jan 1970 00:00:00 GMT' to enforce validation failure
We very much understand this is not elegant but it tends to work ;-) . This is an area for further research. As the paper states: "Ideally, a solution should leverage existing caching practice but extend it in such a way that caches are only bypassed in DT-conneg when essential, but still used whenever possible (e.g., to deliver Mementos)."
I hope this helps. Please let us know what we can do to increase the chances of adoption of the Memento solution for the mediawiki platform. I hope it is clear that we _really_ would like to see this happen!
Cheers
Herbert Van de Sompel
==
Hi all
The Memento Project http://www.mementoweb.org/ (including the Los Alamos National Laboratory (!) featuring Herbert Van de Sompel of OpenURL fame) is proposing a new HTTP header, X-Accept-Datetime, to fetch old versions of a web resource. They already wrote a MediaWiki extension for this http://www.mediawiki.org/wiki/Extension:Memento - which would of course be particularly interesting for use on Wikipedia.
Do you think we could have this for Wikimedia project? I think that would be very nice indeed. I recall that ways to look at last weeks main page have been discussed before, and I see several issues:
* the timestamp isn't a unique identifier, multiple revisions *might* have the same timestamp. We need a tiebreak (rev_id would be the obvious choice). * templates and images also need to be "time warped". It seems like the extension does not address this at the moment. For flagged revisions we do have such a machnism, right? Could that be used here? * Squids would need to know about the new header, and by pass the cache when it's used.
so, what do you think? what does it take? Can we point them to the missing bits?
-- daniel
== Herbert Van de Sompel Digital Library Research & Prototyping Los Alamos National Laboratory, Research Library http://public.lanl.gov/herbertv/ tel. +1 505 667 1267