[Foundation-l] robots.txt and archive.org

Milos Rancic millosh at gmail.com
Sun Sep 18 11:36:02 UTC 2011


On Sun, Sep 18, 2011 at 12:44, Daniel ~ Leinad <danny.leinad at gmail.com> wrote:
> Hmmm... a few days ago we didn't have any problems to retrieve history
> (2001-2011) of Main Pages of Polish Wikipedia:
> http://www.facebook.com/media/set/?set=a.10150309409533189.363981.147056473188
> - similar should be with other versions :>

It is possible to:
* Retrieve time history of the very early Wikipedia (~2001), as there
were no robots.txt.
* Retrieve snapshot of a page when Internet Archive made it.

It is not possible to click on "history" of the page and find the
first version of the page, as it's forbidden by robots.txt.

That's important for the sites created in period 2002-2004, as
Internet Archive has snapshots, while we don't have versions
(.../index.php?oldid=1 doesn't work for those sites or if it works,
timestamp is broken (doesn't exist)).




More information about the wikimedia-l mailing list