Hi all
The Memento Project http://www.mementoweb.org/ (including the Los Alamos National Laboratory (!) featuring Herbert Van de Sompel of OpenURL fame) is proposing a new HTTP header, X-Accept-Datetime, to fetch old versions of a web resource. They already wrote a MediaWiki extension for this http://www.mediawiki.org/wiki/Extension:Memento - which would of course be particularly interesting for use on Wikipedia.
Do you think we could have this for Wikimedia project? I think that would be very nice indeed. I recall that ways to look at last weeks main page have been discussed before, and I see several issues:
* the timestamp isn't a unique identifier, multiple revisions *might* have the same timestamp. We need a tiebreak (rev_id would be the obvious choice). * templates and images also need to be "time warped". It seems like the extension does not address this at the moment. For flagged revisions we do have such a machnism, right? Could that be used here? * Squids would need to know about the new header, and by pass the cache when it's used.
so, what do you think? what does it take? Can we point them to the missing bits?
-- daniel
Instead of witting it as an extra header to HTTP protocol ... why don't they write it as a proxy to wikimedia (or any other site the want to temporal proxy). Getting a new HTTP header out there is not an easy task at best a small percentage of sites will support it and then you need to deploy clients and write user interfaces that support it as well.
If viewing old version of sites is something interesting to them. It probably best to write a interface a firefox extension or grease monkey script that integrates makes a "temporal" interface of their likening for the mediawiki api (presumably the "history button" fails to represent their vision? )... for non-mediawiki sites could access "the way back machine".
If the purpose is to support searching or archival. Then its probably best to proxy the mediaWiki api through a proxy that they setup that supports those temporal requests across all sites (ie an enhanced interface to the wayback machine?)
--michael
Daniel Kinzler wrote:
Hi all
The Memento Project http://www.mementoweb.org/ (including the Los Alamos National Laboratory (!) featuring Herbert Van de Sompel of OpenURL fame) is proposing a new HTTP header, X-Accept-Datetime, to fetch old versions of a web resource. They already wrote a MediaWiki extension for this http://www.mediawiki.org/wiki/Extension:Memento - which would of course be particularly interesting for use on Wikipedia.
Do you think we could have this for Wikimedia project? I think that would be very nice indeed. I recall that ways to look at last weeks main page have been discussed before, and I see several issues:
- the timestamp isn't a unique identifier, multiple revisions *might* have the
same timestamp. We need a tiebreak (rev_id would be the obvious choice).
- templates and images also need to be "time warped". It seems like the
extension does not address this at the moment. For flagged revisions we do have such a machnism, right? Could that be used here?
- Squids would need to know about the new header, and by pass the cache when
it's used.
so, what do you think? what does it take? Can we point them to the missing bits?
-- daniel
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Daniel Kinzler wrote:
The Memento Project http://www.mementoweb.org/ (including the Los Alamos National Laboratory (!) featuring Herbert Van de Sompel of OpenURL fame) is proposing a new HTTP header, X-Accept-Datetime, to fetch old versions of a web resource. They already wrote a MediaWiki extension for this http://www.mediawiki.org/wiki/Extension:Memento - which would of course be particularly interesting for use on Wikipedia.
Do you think we could have this for Wikimedia project? I think that would be very nice indeed. I recall that ways to look at last weeks main page have been discussed before, and I see several issues:
- the timestamp isn't a unique identifier, multiple revisions *might* have the
same timestamp. We need a tiebreak (rev_id would be the obvious choice).
I'd say it is, if sufficiently precise :) If not, either use the lowest/highest rev_id, or the user could be asked to choose a version.
- templates and images also need to be "time warped". It seems like the
extension does not address this at the moment. For flagged revisions we do have such a machnism, right? Could that be used here?
I see three independent things here:
1) When viewing a past version of a page, show appropriate templates, images, magic words etc.
2) When viewing a past version of a page, link to other pages as appropriate (show red links if they haven't yet existed, link to their appropriate past version if they have). I'd say this is the easiest to implement, and the most interesting for readers.
3) Ability to view a page as it looked at a certain time (as opposed to a certain revision).
On Thu, Nov 12, 2009 at 10:43 AM, Nikola Smolenski smolensk@eunet.rs wrote:
I'd say it is, if sufficiently precise :)
MediaWiki only keeps timestamps to one-second precision, so it's not.
Дана Thursday 12 November 2009 16:52:54 Aryeh Gregor написа:
On Thu, Nov 12, 2009 at 10:43 AM, Nikola Smolenski smolensk@eunet.rs
wrote:
I'd say it is, if sufficiently precise :)
MediaWiki only keeps timestamps to one-second precision, so it's not.
I propose the following heuristics:
1. If appropriate timestamp doesn't exist in the database, use the newest one older than the requested one.
2. If it exists, and only one revision has the timestamp, use that revision.
3. If more than one revision share the same timestamp, divide the second in the number of revisions parts, and use the revision that falls in the requested timestamp.
Suppose that someone asks for Wikipedia as it looked on 2009-11-13 18:53:11.4281. There are foutr revisions that have 2009-11-13 18:53:11 timestamp, revisions 123456, 123457, 123459 and 123460. Each revision gets its quarter of the second, and since the request falls in the 2nd quarter, use revision 123457.
the scenario of multiple URIs for a single Datetime (second granularity, which I think is all that RFC-822/RFC-1123 format supports) might be a good candidate for http response "300 Multiple choices":
http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.3.1
the entity sent back with the 300 could be:
1. a TimeMap (read: ORE Resource Map), in Atom, RDF, or whatever (see the RDF example at: http://www.mementoweb.org/guide/api/map1.rdf)
2. a custom mediawiki html entity, like a history page with just the values for that Datetime, that allows the user to browse, compare, & select the version they desire.
3. a combination of #1 with an XSLT that transforms the XML into an HTML with the functionality of #2.
4. other ideas?
regards,
Michael
On Fri, 13 Nov 2009, Nikola Smolenski wrote:
Дана Thursday 12 November 2009 16:52:54 Aryeh Gregor написа:
On Thu, Nov 12, 2009 at 10:43 AM, Nikola Smolenski smolensk@eunet.rs
wrote:
I'd say it is, if sufficiently precise :)
MediaWiki only keeps timestamps to one-second precision, so it's not.
I propose the following heuristics:
- If appropriate timestamp doesn't exist in the database, use the newest one
older than the requested one.
If it exists, and only one revision has the timestamp, use that revision.
If more than one revision share the same timestamp, divide the second in
the number of revisions parts, and use the revision that falls in the requested timestamp.
Suppose that someone asks for Wikipedia as it looked on 2009-11-13 18:53:11.4281. There are foutr revisions that have 2009-11-13 18:53:11 timestamp, revisions 123456, 123457, 123459 and 123460. Each revision gets its quarter of the second, and since the request falls in the 2nd quarter, use revision 123457.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
---- Michael L. Nelson mln@cs.odu.edu http://www.cs.odu.edu/~mln/ Dept of Computer Science, Old Dominion University, Norfolk VA 23529 +1 757 683 6393 +1 757 683 4900 (f)
On Fri, Nov 13, 2009 at 2:43 AM, Nikola Smolenski smolensk@eunet.rs wrote:
- the timestamp isn't a unique identifier, multiple revisions *might* have the
same timestamp. We need a tiebreak (rev_id would be the obvious choice).
I'd say it is, if sufficiently precise :) If not, either use the lowest/highest rev_id, or the user could be asked to choose a version.
Seems like a non-issue. User requests the page as it was on the 18th of december 2006, at 16:45:12 UTC. Which of two (or more versions of the page) stored within that second is returned is academic, isn't it? If they know there are two versions and want to refer to a specific one, they should use a rev_id, not a time.
Steve
The extension and the wiki page has been updated. It now resolves this issue with multiple revisions having the same timestamp, by returning an 'HTTP/1.1 300 Multiple Choices' with the list of revision URIs that has the same timestamp. -Harihar
Steve Bennett wrote:
On Fri, Nov 13, 2009 at 2:43 AM, Nikola Smolenski smolensk@eunet.rs wrote:
- the timestamp isn't a unique identifier, multiple revisions *might* have the
same timestamp. We need a tiebreak (rev_id would be the obvious choice).
I'd say it is, if sufficiently precise :) If not, either use the lowest/highest rev_id, or the user could be asked to choose a version.
Seems like a non-issue. User requests the page as it was on the 18th of december 2006, at 16:45:12 UTC. Which of two (or more versions of the page) stored within that second is returned is academic, isn't it? If they know there are two versions and want to refer to a specific one, they should use a rev_id, not a time.
Steve
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Daniel Kinzler wrote:
Hi all
The Memento Project http://www.mementoweb.org/ (including the Los Alamos National Laboratory (!) featuring Herbert Van de Sompel of OpenURL fame) is proposing a new HTTP header, X-Accept-Datetime, to fetch old versions of a web resource. They already wrote a MediaWiki extension for this http://www.mediawiki.org/wiki/Extension:Memento - which would of course be particularly interesting for use on Wikipedia.
Do you think we could have this for Wikimedia project? I think that would be very nice indeed. I recall that ways to look at last weeks main page have been discussed before, and I see several issues:
- the timestamp isn't a unique identifier, multiple revisions *might* have the
same timestamp. We need a tiebreak (rev_id would be the obvious choice).
- templates and images also need to be "time warped". It seems like the
extension does not address this at the moment. For flagged revisions we do have such a machnism, right? Could that be used here?
- Squids would need to know about the new header, and by pass the cache when
it's used.
You can't view the main page as it was in the past, because users routinely upload temporary images to display there, so that they can be protected, and then delete them once they're off the page.
Also, we can't have people crawling Wikipedia while requesting old versions, because of the excessive disk seeking and CPU usage that would generate. That's why the history page has a robot policy of noindex, nofollow.
-- Tim Starling
Hi Tim,
If there's a problem with viewing past versions of the main page, that's perfectly okay -- it can be excluded from the resources that are datetime content negotiable like the Special: pages.
I admit to not following the second issue completely. A regular robot would never issue the X-Accept-Datetime to jump back in time, so that's okay. A regular robot would also respect the history page policy and not crawl backwards either, as you say. A robot that did issue X-Accept-Datetime would end up crawling old revision pages and never hit a history list, but this could also be forbidden via robots.txt if the revision pages were excluded too?
However, that seems like it's a long time off before people write past-web crawlers and the use case for even doing it at all is pretty hard to come up with. :)
Hope this addresses your concerns!
Rob
On Thu, Nov 12, 2009 at 5:15 PM, Tim Starling tstarling@wikimedia.orgwrote:
Daniel Kinzler wrote:
Hi all
The Memento Project http://www.mementoweb.org/ (including the Los
Alamos
National Laboratory (!) featuring Herbert Van de Sompel of OpenURL fame)
is
proposing a new HTTP header, X-Accept-Datetime, to fetch old versions of
a web
resource. They already wrote a MediaWiki extension for this http://www.mediawiki.org/wiki/Extension:Memento - which would of
course be
particularly interesting for use on Wikipedia.
Do you think we could have this for Wikimedia project? I think that would
be
very nice indeed. I recall that ways to look at last weeks main page have
been
discussed before, and I see several issues:
You can't view the main page as it was in the past, because users routinely upload temporary images to display there, so that they can be protected, and then delete them once they're off the page.
Also, we can't have people crawling Wikipedia while requesting old versions, because of the excessive disk seeking and CPU usage that would generate. That's why the history page has a robot policy of noindex, nofollow.
-- Tim Starling
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 12/11/2009, at 1:13 PM, Daniel Kinzler wrote:
Hi all
The Memento Project http://www.mementoweb.org/ (including the Los Alamos National Laboratory (!) featuring Herbert Van de Sompel of OpenURL fame) is proposing a new HTTP header, X-Accept-Datetime, to fetch old versions of a web resource. They already wrote a MediaWiki extension for this http://www.mediawiki.org/wiki/Extension:Memento - which would of course be particularly interesting for use on Wikipedia.
Do you think we could have this for Wikimedia project? I think that would be very nice indeed. I recall that ways to look at last weeks main page have been discussed before, and I see several issues:
- the timestamp isn't a unique identifier, multiple revisions
*might* have the same timestamp. We need a tiebreak (rev_id would be the obvious choice).
- templates and images also need to be "time warped". It seems like
the extension does not address this at the moment. For flagged revisions we do have such a machnism, right? Could that be used here?
- Squids would need to know about the new header, and by pass the
cache when it's used.
so, what do you think? what does it take? Can we point them to the missing bits?
This got written up in New Scientist today, for those who are interested.
http://www.newscientist.com/article/dn18158-timetravelling-browsers-navigate...
-- Andrew Garrett agarrett@wikimedia.org http://werdn.us/
wikitech-l@lists.wikimedia.org