Hi Tim,
If there's a problem with viewing past versions of the main page, that's perfectly okay -- it can be excluded from the resources that are datetime content negotiable like the Special: pages.
I admit to not following the second issue completely. A regular robot would never issue the X-Accept-Datetime to jump back in time, so that's okay. A regular robot would also respect the history page policy and not crawl backwards either, as you say. A robot that did issue X-Accept-Datetime would end up crawling old revision pages and never hit a history list, but this could also be forbidden via robots.txt if the revision pages were excluded too?
However, that seems like it's a long time off before people write past-web crawlers and the use case for even doing it at all is pretty hard to come up with. :)
Hope this addresses your concerns!
Rob
On Thu, Nov 12, 2009 at 5:15 PM, Tim Starling tstarling@wikimedia.orgwrote:
Daniel Kinzler wrote:
Hi all
The Memento Project http://www.mementoweb.org/ (including the Los
Alamos
National Laboratory (!) featuring Herbert Van de Sompel of OpenURL fame)
is
proposing a new HTTP header, X-Accept-Datetime, to fetch old versions of
a web
resource. They already wrote a MediaWiki extension for this http://www.mediawiki.org/wiki/Extension:Memento - which would of
course be
particularly interesting for use on Wikipedia.
Do you think we could have this for Wikimedia project? I think that would
be
very nice indeed. I recall that ways to look at last weeks main page have
been
discussed before, and I see several issues:
You can't view the main page as it was in the past, because users routinely upload temporary images to display there, so that they can be protected, and then delete them once they're off the page.
Also, we can't have people crawling Wikipedia while requesting old versions, because of the excessive disk seeking and CPU usage that would generate. That's why the history page has a robot policy of noindex, nofollow.
-- Tim Starling
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l