[Foundation-l] robots.txt and archive.org

emijrp emijrp at gmail.com
Sun Sep 18 14:03:45 UTC 2011


Hi;

Oldids are not sorted by date, so, the first edits may have big oldids.
Which Wikipedias do you want to explore? You can use the dumps or Toolserver
databases to sort revisions by date and extract the oldest ones. I guess
that Wikipedias created in 2002-2004, preserve the full history (unless
removed pages, but the first edits use to be in the Main Page). English
Wikipedia "lost" some of the first edits during the upgrading in the first
MediaWiki versions, but they were later found by Tim Starling in a very old
backup.

Regards,
emijrp

2011/9/18 Milos Rancic <millosh at gmail.com>

> On Sun, Sep 18, 2011 at 12:44, Daniel ~ Leinad <danny.leinad at gmail.com>
> wrote:
> > Hmmm... a few days ago we didn't have any problems to retrieve history
> > (2001-2011) of Main Pages of Polish Wikipedia:
> >
> http://www.facebook.com/media/set/?set=a.10150309409533189.363981.147056473188
> > - similar should be with other versions :>
>
> It is possible to:
> * Retrieve time history of the very early Wikipedia (~2001), as there
> were no robots.txt.
> * Retrieve snapshot of a page when Internet Archive made it.
>
> It is not possible to click on "history" of the page and find the
> first version of the page, as it's forbidden by robots.txt.
>
> That's important for the sites created in period 2002-2004, as
> Internet Archive has snapshots, while we don't have versions
> (.../index.php?oldid=1 doesn't work for those sites or if it works,
> timestamp is broken (doesn't exist)).
>
> _______________________________________________
> foundation-l mailing list
> foundation-l at lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>


More information about the foundation-l mailing list