hi Daniel, all,
Jakob Voss informed me that there had been a posting regarding memento on this list. Since we are very keen on native Memento support for the mediawiki platform, I felt like responding by giving some perspective on Memento in general, and on some questions that were raised regarding its implementation for mediawiki, specifically:
1. As with previous projects we have engaged in (OpenURL, OAI-PMH, OAI- ORE, SRU/W), Memento is not merely an academic exercise about which we want to publish papers. Given our status as researchers, we do need to publish the occasional paper but the real goal of the project is to make datetime content negotiation for the Web really happen. Doing so will require a lot more work, at various levels, including:
a. formal specification (we are thinking an Internet Draft => RFC path), b. promotion, c. real life implementations, d. further research.
Currently, we are focusing on (b) and (c) to try and overcome a chicken and egg situation: we think we propose a nice framework to integrate archival content seamlessly in regular web navigation, but in order to demonstrate it we need some adoption. But in order to get some adoption we need to be able to demonstrate the framework, which we can't do without adoption. Etc. ;-)
It is in this context that our contact with Jakob Voss, and Daniel's mail on this list, is really exciting. The ability to demonstrate Memento at work for mediawiki platform, including the Wikipedia deployments, would be absolutely fantastic for the Memento cause. That is why we have immediately engaged in further development of our initial prototype Memento plug-in for mediawiki, to take into account remarks made by Jakob, and to make it more robust. The ongoing work is at http://www.mediawiki.org/wiki/Extension:Memento .
2. Let me describe the actual status and challenges faced in the Memento plug-in work:
2.1. The plug-in detects a client's X-Accept-Datetime header, and returns the mediawiki page that was active at the datetime specified in the header. Same for images, actually. This effectively allows navigating (as in clicking links) a mediawiki collection as it existed in the past: as long as a client issues an X-Accept-Datetime header, matching history pages/images will be retrieved.
2.2. We are looking into addressing this issue raised by Jakob (and Daniel): Display history pages with the template that was active at the time the history page acted as the current one. We definitely think this would be cool, but we don't think it can be achieved by our plug-in because templates are included at the server side, i.e. they are not URI-addressed XSL that are rendered at the client-side. Hence, one can't do datetime content negotiation on them - they are outside of the memento realm and rather in the realm of the CMS. So, we are looking at the mediawiki code to see whether a history page, when rendered, could itself retrieve the appropriate (old) template from the database. If we are successful, we will share that code also at http://www.mediawiki.org/wiki/Extension:Memento once available. It will obviously be up to the mediawiki community whether they are willing to adopt the proposed change to the codebase.
2.3. We have looked into another issue raised by Jakob: Display deleted pages as they existed at the datetime expressed in X-Datetime- Accept. We have actually implemented this. There are 2 caveats: - as is the case with mediawiki in general, deleted pages are only accessible by those with appropriate permissions; - as is the case with mediawiki in general, deleted pages show up in Edit mode. This code will soon be included at http://www.mediawiki.org/wiki/Extension:Memento .
2.4. We do not feel that all pages should necessarily be subject to datetime content negotiation, in the same way that not all URIs are subject to content negotiation in other dimensions. We feel that the Special Pages fall under this category, as they do not have History.
2.5. We have ideas regarding how to address the issue raised by Daniel: the timestamp isn't a unique identifier, multiple revisions *might* have the same timestamp. From the perspective of Memento, a datetime is obviously the only "globally" recognizable value that can be used for negotiation. If cases occur where multiple versions of a page exist for the same second, the thing to do according to RFC 2295 would be to return a "300 Mutliple Choices", listing the URIs (and metadata) of those version in an Alternates header. The client then has to take it from there.
2.6. The caching issue is a general problem arising from introducing Memento in a web that does not (yet) do Memento: when in datetime content negotiation mode all caches between client and server (both included) need to be bypassed. As described in our paper, we currently address this problem by adding the following client headers:
Cache-Control: no-cache => to force cache revalidation, and If-Modified-Since: Thu, 01 Jan 1970 00:00:00 GMT' to enforce validation failure
We very much understand this is not elegant but it tends to work ;-) . This is an area for further research. As the paper states: "Ideally, a solution should leverage existing caching practice but extend it in such a way that caches are only bypassed in DT-conneg when essential, but still used whenever possible (e.g., to deliver Mementos)."
I hope this helps. Please let us know what we can do to increase the chances of adoption of the Memento solution for the mediawiki platform. I hope it is clear that we _really_ would like to see this happen!
Cheers
Herbert Van de Sompel
==
Hi all
The Memento Project http://www.mementoweb.org/ (including the Los Alamos National Laboratory (!) featuring Herbert Van de Sompel of OpenURL fame) is proposing a new HTTP header, X-Accept-Datetime, to fetch old versions of a web resource. They already wrote a MediaWiki extension for this http://www.mediawiki.org/wiki/Extension:Memento - which would of course be particularly interesting for use on Wikipedia.
Do you think we could have this for Wikimedia project? I think that would be very nice indeed. I recall that ways to look at last weeks main page have been discussed before, and I see several issues:
* the timestamp isn't a unique identifier, multiple revisions *might* have the same timestamp. We need a tiebreak (rev_id would be the obvious choice). * templates and images also need to be "time warped". It seems like the extension does not address this at the moment. For flagged revisions we do have such a machnism, right? Could that be used here? * Squids would need to know about the new header, and by pass the cache when it's used.
so, what do you think? what does it take? Can we point them to the missing bits?
-- daniel
== Herbert Van de Sompel Digital Library Research & Prototyping Los Alamos National Laboratory, Research Library http://public.lanl.gov/herbertv/ tel. +1 505 667 1267
Hi Michael and all,
The first thing which we implemented was exactly this idea of a proxy using the wikipedia API.
The proxy is here: http://mementoproxy.lanl.gov/wiki/timegate/(wikipedia URI)
For example:
http://mementoproxy.lanl.gov/wiki/timegate/http://en.wikipedia.org/wiki/Cloc...
We have also implemented proxies for the Internet Archive, Archive-It, WebCitation.org and several others, as proof-of-concept pieces for the research.
There are several reasons why a native implementation is better for all concerned:
1. The browser somehow needs to know where the proxy is, rather than being natively redirected to the correct page. For a few websites, and a few proxies, this is tolerable. However even one proxy per CMS would be an impossible burden to maintain, let alone one proxy per website!
2. If the website redirected to the proxy, rather than the client knowing where to go, then this would be on trust that the proxy behaved correctly. In a native implementation, you're never redirected off-site.
3. The proxy will redirect back to the appropriate history page, however this page doesn't know that it's being treated as a Memento, and will not issue the X-Datetime-Validity or X-Archive-Interval headers. This makes it difficult (but not impossible) for the client to trap that it has been redirected correctly.
4. The offsite redirection adds at least 2 extra HTTP transactions per resource, slowing down the retrieval. In the native implementation the main page redirects to the history page directly. In the proxy, the browser goes to the main page, then either knows of or is redirected to the proxy, the proxy makes one or more API calls to fetch the history for the page to calculate the right revision, and then redirects the client back there.
5. We don't have to maintain the proxies :)
So for wikimedia installations the native approach is better as it's trusted and faster and involves less API calls. For the client it's better as it's faster and doesn't require intelligence or a list of proxies. For the proxy maintainer it's better as they're no longer needed.
I hope that helps clarify things,
Rob Sanderson (Also at Los Alamos with Herbert Van de Sompel)
Michael Dale wrote:
Instead of witting it as an extra header to HTTP protocol ... why don't they write it as a proxy to wikimedia (or any other site the want to temporal proxy). Getting a new HTTP header out there is not an easy task at best a small percentage of sites will support it and then you need to deploy clients and write user interfaces that support it as well.
If viewing old version of sites is something interesting to them. It probably best to write a interface a firefox extension or grease monkey script that integrates makes a "temporal" interface of their likening for the mediawiki api (presumably the "history button" fails to represent their vision? )... for non-mediawiki sites could access "the way back machine".
If the purpose is to support searching or archival. Then its probably best to proxy the mediaWiki api through a proxy that they setup that supports those temporal requests across all sites (ie an enhanced interface to the wayback machine?)
--michael
Hello Herbert.
Herbert Van de Sompel wrote:
- Let me describe the actual status and challenges faced in the
Memento plug-in work:
2.1. The plug-in detects a client's X-Accept-Datetime header, and returns the mediawiki page that was active at the datetime specified in the header. Same for images, actually.
2.2. Display history pages with the template that was active at the time the history page acted as the current one. [Snip] So, we are looking at the mediawiki code to see whether a history page, when rendered, could itself retrieve the appropriate (old) template from the database. If we are successful, we will share that code also at http://www.mediawiki.org/wiki/Extension:Memento once available. It will obviously be up to the mediawiki community whether they are willing to adopt the proposed change to the codebase.
Obviously it's a server issue.
2.3. We have looked into another issue raised by Jakob: Display deleted pages as they existed at the datetime expressed in X-Datetime- Accept. We have actually implemented this. There are 2 caveats:
- as is the case with mediawiki in general, deleted pages are only
accessible by those with appropriate permissions;
- as is the case with mediawiki in general, deleted pages show up in
Edit mode. This code will soon be included at http://www.mediawiki.org/wiki/Extension:Memento
Showing deleted pages in edit mode is not always the case, since they can't be rendered (albeit not with the old templates, which would be an interesting enhacement by your work).
It is impressive how far you have gone. However, I don't think you can do a *complete* implementation.
First, you should be aware that timemachining the pages has been tried in the past. Discussions treating FlaggedReves are also relevant for your project. FlaggedRevs is an extension which allow to mark the status of a page (eg. not vandalised) at a point in time. A naive implementation would store the timestamp and get the old version from the archive. They ended up storing in a table specific to the extension the page content with templates transcluded. However, flaggedrevs is a tool to fight vandalism. Yours is an archival one. You could accept imperfect results under certain circunstances.
Problematic aspects:
Page moves/image moves: *You want to see content of Foo at epoch, but the history now at Foo is wrong. Instead you need to look at that history of the page now at Foo_(disambiguation) You need to follow (perhaps even many times) the move logs to find out the real page.
Page merges: *When two pages have been merged, you will want to show the revision which was originally at the page the user wants to timemachine. You can no longer just rely on the timestamps. You may be able to get that by splitting the sources at the merge time and going back via rev_parent_id. Needless to say, this is very inefficient, this piece wouldn't be put live at wikipedia.
Partial undeletions: *When a page is undeleted, the summary shows how many revisions were undeleted, but not *which* ones.
Case: *Page A has two edits (#1 and #2). *A vandal adds obscene content to it (#3). *Admin deletes the page and restores the two first revisions. *Several months later, the page is completely deleted.
When an admin wants to view what the page looked like those months, an application is unable to determine if the two revisions which had been shown were #1 and #2 or perhaps #2 and #3.
revdelete may have similar issues.
2.4. We do not feel that all pages should necessarily be subject to datetime content negotiation, in the same way that not all URIs are subject to content negotiation in other dimensions. We feel that the Special Pages fall under this category, as they do not have History.
2.5. We have ideas regarding how to address the issue raised by Daniel: the timestamp isn't a unique identifier, multiple revisions *might* have the same timestamp. From the perspective of Memento, a datetime is obviously the only "globally" recognizable value that can be used for negotiation. If cases occur where multiple versions of a page exist for the same second, the thing to do according to RFC 2295 would be to return a "300 Mutliple Choices", listing the URIs (and metadata) of those version in an Alternates header. The client then has to take it from there.
2.6. The caching issue is a general problem arising from introducing Memento in a web that does not (yet) do Memento: when in datetime content negotiation mode all caches between client and server (both included) need to be bypassed. As described in our paper, we currently address this problem by adding the following client headers:
Cache-Control: no-cache => to force cache revalidation, and If-Modified-Since: Thu, 01 Jan 1970 00:00:00 GMT' to enforce validation failure
We very much understand this is not elegant but it tends to work ;-) .
The caching issue is IMHO the bigger problem in your approach using the new header. Disabling cache on the request kind of work (although not in the long term), but you also need to disable caching at the server, so when someone accessing by your same proxy (ignorant of X-Accept-Datetime) to the current page doesn't get the cached page you were served earlier.
RFC 2145 states very clearly that "A proxy MUST forward an unknown header", but in your case it'd have been preferable that the header wasn't forwarded if the proxy isn't memento aware.
Which leads us to another issue, which is that it seems your server implementation doesn't "acknowledge" memento, so given a response to a X-Accept-Datetime, you don't know if what you're getting is the version you requested or the current one (because the server ignored it). It can be as simple as requiring a Last-Modified <= X-Accept-Datetime on Accept-Datetime responses (that would allow the server to explicitely tell since when is it valid), but extended to all response codes.
On Nov 12, 2009, at 3:19 PM, Platonides wrote:
2.3. We have looked into another issue raised by Jakob: Display deleted pages as they existed at the datetime expressed in X- Datetime- Accept. We have actually implemented this. There are 2 caveats:
- as is the case with mediawiki in general, deleted pages are only
accessible by those with appropriate permissions;
- as is the case with mediawiki in general, deleted pages show up in
Edit mode. This code will soon be included at http://www.mediawiki.org/wiki/Extension:Memento
Showing deleted pages in edit mode is not always the case, since they can't be rendered (albeit not with the old templates, which would be an interesting enhacement by your work).
It is impressive how far you have gone. However, I don't think you can do a *complete* implementation.
First, you should be aware that timemachining the pages has been tried in the past. Discussions treating FlaggedReves are also relevant for your project. FlaggedRevs is an extension which allow to mark the status of a page (eg. not vandalised) at a point in time. A naive implementation would store the timestamp and get the old version from the archive. They ended up storing in a table specific to the extension the page content with templates transcluded. However, flaggedrevs is a tool to fight vandalism. Yours is an archival one. You could accept imperfect results under certain circunstances.
Indeed, it suffices to look at the Internet Archive and comparable web archives to see that one needs to live with what is reasonably achievable, not with what one would love to have. Imperfection is allowed when looking at this problem from an archival perspective.
Related to this, one must be careful not to cross the border between:
(a) what can purely be achieved using the primitives of the web architecture (URI, resource, representation), and HTTP, with datetime content negotiation added to the mix; (b) what is in the realm of content, interpretation, etc.
Let me explain what I mean: Wikipedia used to have a page for "Alito". The page got discontinued and in its place came a page "Samuel Alito". Both have their separate URIs, and so for each individually datetime content negotiation will work nicely. That is what I mean with (a) above. However, connecting "Alito" and "Samuel Alito" moves us into the realm of (b). Things could be done in this specific type of case, as redirects are in place between the Alito and Samuel Alito URIs (unfortunately not the 304 or 302 one would expect but rather a 200) meaning such redirection info is in the database. Hence it could be acted upon. And, so we could explore this, although I feel this gets us into the (b) zone. Again, generally speaking we must remain aware of the line between (a) and (b) above. A
Cheers
herbert
== Herbert Van de Sompel Digital Library Research & Prototyping Los Alamos National Laboratory, Research Library http://public.lanl.gov/herbertv/ tel. +1 505 667 1267
On Nov 12, 2009, at 3:19 PM, Platonides wrote:
2.6. The caching issue is a general problem arising from introducing Memento in a web that does not (yet) do Memento: when in datetime content negotiation mode all caches between client and server (both included) need to be bypassed. As described in our paper, we currently address this problem by adding the following client headers:
Cache-Control: no-cache => to force cache revalidation, and If-Modified-Since: Thu, 01 Jan 1970 00:00:00 GMT' to enforce validation failure
We very much understand this is not elegant but it tends to work ;-) .
The caching issue is IMHO the bigger problem in your approach using the new header. Disabling cache on the request kind of work (although not in the long term), but you also need to disable caching at the server, so when someone accessing by your same proxy (ignorant of X-Accept-Datetime) to the current page doesn't get the cached page you were served earlier.
Agreed, of course, that our current cache fix is a temp solution.
Not sure what you mean by the above remark, but it is totally fine to cache the current page in mediawiki because the history pages are not served from the URI of the current page, neither by our plug-in nor in Memento in general (see http://www.mementoweb.org/guide/http/local/). Rather, a X-Datetime-Accept request is redirected (302 Found) to an appropriate history resource that has its own URI (with title and oldid in case of mediawiki). And, hence, even those history pages can be cached by mediawiki equipped with the memento plug-in.
RFC 2145 states very clearly that "A proxy MUST forward an unknown header", but in your case it'd have been preferable that the header wasn't forwarded if the proxy isn't memento aware.
Which leads us to another issue, which is that it seems your server implementation doesn't "acknowledge" memento, so given a response to a X-Accept-Datetime, you don't know if what you're getting is the version you requested or the current one (because the server ignored it). It can be as simple as requiring a Last-Modified <= X-Accept- Datetime on Accept-Datetime responses (that would allow the server to explicitely tell since when is it valid), but extended to all response codes.
Actually, have a look at http://www.mementoweb.org/guide/http/local/ . You will note that the following response header is always included:
X-Archive-Interval: {datetime_start} - {datetime_end}
This allows a client to understand he received a history resource. The values to use are the start datetime and end datetime for which the server has representations for the the URI at hand.
Our plug-in implements this for mediawiki. Our proxy can't do this.
Cheers
herbert
== Herbert Van de Sompel Digital Library Research & Prototyping Los Alamos National Laboratory, Research Library http://public.lanl.gov/herbertv/ tel. +1 505 667 1267
We have made some updates to the Memento extension and we have also written a fix to perform datetime content negotiation on transcluded templates. Details can be found in the wiki page for the extension http://www.mediawiki.org/wiki/Extension:Memento . Harihar (Los Alamos National Labs)
Herbert Van de Sompel wrote:
On Nov 12, 2009, at 3:19 PM, Platonides wrote:
2.6. The caching issue is a general problem arising from introducing Memento in a web that does not (yet) do Memento: when in datetime content negotiation mode all caches between client and server (both included) need to be bypassed. As described in our paper, we currently address this problem by adding the following client headers:
Cache-Control: no-cache => to force cache revalidation, and If-Modified-Since: Thu, 01 Jan 1970 00:00:00 GMT' to enforce validation failure
We very much understand this is not elegant but it tends to work ;-) .
The caching issue is IMHO the bigger problem in your approach using the new header. Disabling cache on the request kind of work (although not in the long term), but you also need to disable caching at the server, so when someone accessing by your same proxy (ignorant of X-Accept-Datetime) to the current page doesn't get the cached page you were served earlier.
Agreed, of course, that our current cache fix is a temp solution.
Not sure what you mean by the above remark, but it is totally fine to cache the current page in mediawiki because the history pages are not served from the URI of the current page, neither by our plug-in nor in Memento in general (see http://www.mementoweb.org/guide/http/local/). Rather, a X-Datetime-Accept request is redirected (302 Found) to an appropriate history resource that has its own URI (with title and oldid in case of mediawiki). And, hence, even those history pages can be cached by mediawiki equipped with the memento plug-in.
RFC 2145 states very clearly that "A proxy MUST forward an unknown header", but in your case it'd have been preferable that the header wasn't forwarded if the proxy isn't memento aware.
Which leads us to another issue, which is that it seems your server implementation doesn't "acknowledge" memento, so given a response to a X-Accept-Datetime, you don't know if what you're getting is the version you requested or the current one (because the server ignored it). It can be as simple as requiring a Last-Modified <= X-Accept- Datetime on Accept-Datetime responses (that would allow the server to explicitely tell since when is it valid), but extended to all response codes.
Actually, have a look at http://www.mementoweb.org/guide/http/local/ . You will note that the following response header is always included:
X-Archive-Interval: {datetime_start} - {datetime_end}
This allows a client to understand he received a history resource. The values to use are the start datetime and end datetime for which the server has representations for the the URI at hand.
Our plug-in implements this for mediawiki. Our proxy can't do this.
Cheers
herbert
== Herbert Van de Sompel Digital Library Research & Prototyping Los Alamos National Laboratory, Research Library http://public.lanl.gov/herbertv/ tel. +1 505 667 1267
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Thu, Nov 12, 2009 at 3:55 PM, Herbert Van de Sompel hvdsomp@gmail.com wrote:
2.1. The plug-in detects a client's X-Accept-Datetime header, and returns the mediawiki page that was active at the datetime specified in the header. Same for images, actually. This effectively allows navigating (as in clicking links) a mediawiki collection as it existed in the past: as long as a client issues an X-Accept-Datetime header, matching history pages/images will be retrieved.
Doesn't the use of a header here violate the idea of each URL representing only one resource? The server will be returning totally different things for a GET to the same URL. That seems like it would cause all sorts of problems -- not only do caching proxies break (which I'd think by itself makes the feature unusable for users behind caching proxies), but how do you deal with things like bookmarking, or sending a link to a particular version of the page to someone? These would become impossible, unless the server goes to the extra effort to return a redirect.
It seems to me like a better path would be to have different URLs for different dates. The obvious way to do this would be to take an approach like OpenSearch, and provide a URL pattern in some standard format. Maybe the page could contain <link rel=oldversions> or such, with the client appending a query parameter to the given URL, say time=T where T is an ISO 8601 string.
Aryeh Gregor schrieb:
Doesn't the use of a header here violate the idea of each URL representing only one resource? The server will be returning totally different things for a GET to the same URL. That seems like it would cause all sorts of problems -- not only do caching proxies break (which I'd think by itself makes the feature unusable for users behind caching proxies), but how do you deal with things like bookmarking, or sending a link to a particular version of the page to someone? These would become impossible, unless the server goes to the extra effort to return a redirect.
It seems to me like a better path would be to have different URLs for different dates. The obvious way to do this would be to take an approach like OpenSearch, and provide a URL pattern in some standard format. Maybe the page could contain <link rel=oldversions> or such, with the client appending a query parameter to the given URL, say time=T where T is an ISO 8601 string.
How about doing both? If a X-Datetime-Accept header is received, it could trigger a 302 redirect, pointing at a url that specifies the desired point in time.
-- daniel
On Nov 13, 2009, at 2:08, Daniel Kinzler daniel@brightbyte.de wrote:
Aryeh Gregor schrieb:
Doesn't the use of a header here violate the idea of each URL representing only one resource? The server will be returning totally different things for a GET to the same URL. That seems like it would cause all sorts of problems -- not only do caching proxies break (which I'd think by itself makes the feature unusable for users behind caching proxies), but how do you deal with things like bookmarking, or sending a link to a particular version of the page to someone? These would become impossible, unless the server goes to the extra effort to return a redirect.
It seems to me like a better path would be to have different URLs for different dates. The obvious way to do this would be to take an approach like OpenSearch, and provide a URL pattern in some standard format. Maybe the page could contain <link rel=oldversions> or such, with the client appending a query parameter to the given URL, say time=T where T is an ISO 8601 string.
How about doing both? If a X-Datetime-Accept header is received, it could trigger a 302 redirect, pointing at a url that specifies the desired point in time.
This is exactly what we do in Memento and with the plug-in: datetime content negotiation (X-Accept-Datetime header) on the generic URI (say /clock in wikipedia) followed by a 302 redirect to the time- specific URI (title="clock"&oldid=123456 in wikipedia). The generic URI is always only serving the current version of the page; the history URIs are serving the history pages.
Herbert
-- daniel
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
I'd like to expound on Herbert's point below. We chose 302/Location style CN (instead of 200/Content-Location) to provide more transparency in the process. So I can link to:
http://en.wikipedia.org/wiki/The_Cribs
but if I have my Memento FF add-on set to:
X-Accept-Datetime: {Tue, 29 January 2009 11:41:00 GMT}
I'll get redirected to:
http://en.wikipedia.org/w/index.php?title=The_Cribs&oldid=187673999
which will show up in my browser's location bar and thus linking, sharing, etc. will be done with the correct "old" URI. This would not be the case with 200/Content-Location style CN. If the old version is not what the user wants to link, share, etc., then turning off the Memento add-on and doing a reload (possibly a shift-reload) will cause FF to correctly go back to the original URI (b/c FF does the right thing w/ the 302 semantics that say you should reuse the original URI).
Wikipedia is sort of a special case in that the URI:
http://en.wikipedia.org/wiki/The_Cribs
will return both the current representation as well as an older representation (if CN is requested by the client). That is, that URI is both URI-R and URI-G in the parlance of:
http://www.mementoweb.org/guide/http/local/
Most servers that are not hooked to a CMS (like a wiki) will have URI-G be a separate URI, presumably in a separate archive. See:
http://www.mementoweb.org/guide/http/remote/
There is already support for caching & CN, see:
http://httpd.apache.org/docs/2.3/content-negotiation.html#caching http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.6
Of course, the current caches don't know about "X-Accept-Datetime", but that can come in the future (esp. when an RFC is written and the "X-" are removed from the various headers introduced by Memento). I'm not sure if they'll need to be aware of "Accept-Datetime" specifically, or (hopefully) they'll do the right thing with whatever values are returned in the "Vary" response header. We'll see.
The goal of introducing a 5th dimension for CN (to complement type, encoding, language & charset) is that we are more likely to integrate with the existing http infrastructure. More so, we suspect, than introducing an RPC-like convention of arguments tacked onto URIs (e.g., "foo?datetime=xxx" or "foo?datetime=now") or overloading URI fragments.
regards,
Michael
On Fri, 13 Nov 2009, Herbert Van de Sompel wrote:
On Nov 13, 2009, at 2:08, Daniel Kinzler daniel@brightbyte.de wrote:
Aryeh Gregor schrieb:
Doesn't the use of a header here violate the idea of each URL representing only one resource? The server will be returning totally different things for a GET to the same URL. That seems like it would cause all sorts of problems -- not only do caching proxies break (which I'd think by itself makes the feature unusable for users behind caching proxies), but how do you deal with things like bookmarking, or sending a link to a particular version of the page to someone? These would become impossible, unless the server goes to the extra effort to return a redirect.
It seems to me like a better path would be to have different URLs for different dates. The obvious way to do this would be to take an approach like OpenSearch, and provide a URL pattern in some standard format. Maybe the page could contain <link rel=oldversions> or such, with the client appending a query parameter to the given URL, say time=T where T is an ISO 8601 string.
How about doing both? If a X-Datetime-Accept header is received, it could trigger a 302 redirect, pointing at a url that specifies the desired point in time.
This is exactly what we do in Memento and with the plug-in: datetime content negotiation (X-Accept-Datetime header) on the generic URI (say /clock in wikipedia) followed by a 302 redirect to the time- specific URI (title="clock"&oldid=123456 in wikipedia). The generic URI is always only serving the current version of the page; the history URIs are serving the history pages.
Herbert
-- daniel
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
---- Michael L. Nelson mln@cs.odu.edu http://www.cs.odu.edu/~mln/ Dept of Computer Science, Old Dominion University, Norfolk VA 23529 +1 757 683 6393 +1 757 683 4900 (f)
On 13/11/2009, at 2:25 AM, Aryeh Gregor wrote:
On Thu, Nov 12, 2009 at 3:55 PM, Herbert Van de Sompel hvdsomp@gmail.com wrote:
2.1. The plug-in detects a client's X-Accept-Datetime header, and returns the mediawiki page that was active at the datetime specified in the header. Same for images, actually. This effectively allows navigating (as in clicking links) a mediawiki collection as it existed in the past: as long as a client issues an X-Accept-Datetime header, matching history pages/images will be retrieved.
Doesn't the use of a header here violate the idea of each URL representing only one resource? The server will be returning totally different things for a GET to the same URL. That seems like it would cause all sorts of problems -- not only do caching proxies break (which I'd think by itself makes the feature unusable for users behind caching proxies), but how do you deal with things like bookmarking, or sending a link to a particular version of the page to someone? These would become impossible, unless the server goes to the extra effort to return a redirect.
I assume the solution to this would be a Vary: X-Accept-Datetime header.
-- Andrew Garrett agarrett@wikimedia.org http://werdn.us/
On Nov 13, 2009, at 2:55 PM, Andrew Garrett wrote:
On 13/11/2009, at 2:25 AM, Aryeh Gregor wrote:
On Thu, Nov 12, 2009 at 3:55 PM, Herbert Van de Sompel hvdsomp@gmail.com wrote:
2.1. The plug-in detects a client's X-Accept-Datetime header, and returns the mediawiki page that was active at the datetime specified in the header. Same for images, actually. This effectively allows navigating (as in clicking links) a mediawiki collection as it existed in the past: as long as a client issues an X-Accept-Datetime header, matching history pages/images will be retrieved.
Doesn't the use of a header here violate the idea of each URL representing only one resource? The server will be returning totally different things for a GET to the same URL. That seems like it would cause all sorts of problems -- not only do caching proxies break (which I'd think by itself makes the feature unusable for users behind caching proxies), but how do you deal with things like bookmarking, or sending a link to a particular version of the page to someone? These would become impossible, unless the server goes to the extra effort to return a redirect.
I assume the solution to this would be a Vary: X-Accept-Datetime header.
Please have a look at the HTTP Transactions for datetime content negotiation available at:
http://www.mementoweb.org/guide/http/local/
This shows that we indeed include a response header:
Vary: negotiate, X-Accept-Datetime
Cheers
Herbert Van de Sompel
== Herbert Van de Sompel Digital Library Research & Prototyping Los Alamos National Laboratory, Research Library http://public.lanl.gov/herbertv/ tel. +1 505 667 1267
wikitech-l@lists.wikimedia.org