[Foundation-l] Old Wikipedia backups discovered
Chad
innocentkiller at gmail.com
Tue Dec 14 16:11:37 UTC 2010
On Tue, Dec 14, 2010 at 10:54 AM, Tim Starling <tstarling at wikimedia.org> wrote:
> I was looking through some old files in our SourceForge project. I
> opened a file called wiki.tar.gz, and inside were three complete
> backups of the text of Wikipedia, from February, March and August 2001!
>
> This is exciting, because there is lots of article history in here
> which was assumed to be lost forever.
>
> I've long been interested in Wikipedia's history, and I've tried in
> the past to locate such backups. I asked various people who might have
> had one. I had given up hope.
>
> The history of particularly old Wikipedia articles, as seen in the
> present Wikipedia database, is incomplete, due to Usemod's policy of
> deleting old revisions of pages after about a month. The script which
> Brion wrote to import the article histories from UseMod to MediaWiki
> only fetched those revisions which hadn't been purged yet.
>
> I didn't want to believe that those revisions had been lost forever,
> and I even opened the UseMod source code and stared forlornly at the
> unlink() call. What I (and Brion before) missed is that UseMod appends
> a record of every change made to two files, called diff_log and rclog.
> In these two files is a record of every change made to Wikipedia from
> January 15 to August 17, 2001.
>
> I've put the two log files up on the web, at:
>
> http://noc.wikimedia.org/~tstarling/wikipedia-logs-2001-08-17.7z
>
> The 7-zip archive is only 8.4MB -- much more manageable than today's
> backups.
>
> rclog contains IP addresses. The Usemod software made IP addresses of
> logged-in users public, so the people who made these edits had no
> expectation that their IP address would be kept private. That, coupled
> with the passage of time, makes me think that no harm to user privacy
> can come from releasing these files.
>
> -- Tim Starling
>
I have to say this is super cool. It's like digging up a time capsule
right before the 10th anniversary. One of my favorite early edits:
"This is the new WikiPedia! The idea here is to write a complete
encyclopedia from scratch, without peer review process, etc.
Some people think that this may be a hopeless endeavor, that
the result will necessarily suck. We aren't so sure. So, let's get
to work!"
-Chad
More information about the wikimedia-l
mailing list