[Foundation-l] Old Wikipedia backups discovered

Moka Pantages mpantages at wikimedia.org
Tue Dec 14 18:23:07 UTC 2010


This is so exciting!  To Steven's point: we've also started a page
where folks can add bits of interesting information as they excavate
the files [1].   Can't wait to dig in!

Congrats, Tim!

[1] http://ten.wikipedia.org/wiki/Wikipedia_in_the_Beginning


Date: Tue, 14 Dec 2010 08:20:10 -0800
From: Steven Walling <steven.walling at gmail.com>
Subject: Re: [Foundation-l] Old Wikipedia backups discovered
To: Wikimedia Foundation Mailing List
       <foundation-l at lists.wikimedia.org>
Message-ID:
       <AANLkTin9CjXR1S_eCfR3nR6Xmt6C4o=6oHDhTXP4JPzL at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

This is fantastic, and the timing could not be better.

If anyone finds anything noteworthy, please add it to the timeline of
Wikipedia that we're building at the 10th anniversary wiki,[1] as well as
the other tools for cataloging interesting tidbits from our history.[2]

1. http://ten.wikipedia.org/wiki/Wikipedia_timeline
2. http://ten.wikipedia.org/wiki/Share

On Tue, Dec 14, 2010 at 8:11 AM, Chad <innocentkiller at gmail.com> wrote:

> On Tue, Dec 14, 2010 at 10:54 AM, Tim Starling <tstarling at wikimedia.org>
> wrote:
> > I was looking through some old files in our SourceForge project. I
> > opened a file called wiki.tar.gz, and inside were three complete
> > backups of the text of Wikipedia, from February, March and August 2001!
> >
> > This is exciting, because there is lots of article history in here
> > which was assumed to be lost forever.
> >
> > I've long been interested in Wikipedia's history, and I've tried in
> > the past to locate such backups. I asked various people who might have
> > had one. I had given up hope.
> >
> > The history of particularly old Wikipedia articles, as seen in the
> > present Wikipedia database, is incomplete, due to Usemod's policy of
> > deleting old revisions of pages after about a month. The script which
> > Brion wrote to import the article histories from UseMod to MediaWiki
> > only fetched those revisions which hadn't been purged yet.
> >
> > I didn't want to believe that those revisions had been lost forever,
> > and I even opened the UseMod source code and stared forlornly at the
> > unlink() call. What I (and Brion before) missed is that UseMod appends
> > a record of every change made to two files, called diff_log and rclog.
> > In these two files is a record of every change made to Wikipedia from
> > January 15 to August 17, 2001.
> >
> > I've put the two log files up on the web, at:
> >
> > http://noc.wikimedia.org/~tstarling/wikipedia-logs-2001-08-17.7z
> >
> > The 7-zip archive is only 8.4MB -- much more manageable than today's
> > backups.
> >
> > rclog contains IP addresses. The Usemod software made IP addresses of
> > logged-in users public, so the people who made these edits had no
> > expectation that their IP address would be kept private. That, coupled
> > with the passage of time, makes me think that no harm to user privacy
> > can come from releasing these files.
> >
> > -- Tim Starling
> >
>
> I have to say this is super cool. It's like digging up a time capsule
> right before the 10th anniversary. One of my favorite early edits:
>
> "This is the new WikiPedia!  The idea here is to write a complete
> encyclopedia from scratch, without peer review process, etc.
> Some people think that this may be a hopeless endeavor, that
> the result will necessarily suck.  We aren't so sure.  So, let's get
> to work!"
>
> -Chad
>
> _______________________________________________
> foundation-l mailing list
> foundation-l at lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l




More information about the wikimedia-l mailing list