Right in time! And the rightly early version too! Kudos to the diggers and bashers!
On Tue, Dec 14, 2010 at 21:23, Moka Pantages mpantages@wikimedia.orgwrote:
This is so exciting! To Steven's point: we've also started a page where folks can add bits of interesting information as they excavate the files [1]. Can't wait to dig in!
Congrats, Tim!
[1] http://ten.wikipedia.org/wiki/Wikipedia_in_the_Beginning
Date: Tue, 14 Dec 2010 08:20:10 -0800 From: Steven Walling steven.walling@gmail.com Subject: Re: [Foundation-l] Old Wikipedia backups discovered To: Wikimedia Foundation Mailing List foundation-l@lists.wikimedia.org Message-ID: AANLkTin9CjXR1S_eCfR3nR6Xmt6C4o=6oHDhTXP4JPzL@mail.gmail.com Content-Type: text/plain; charset=ISO-8859-1
This is fantastic, and the timing could not be better.
If anyone finds anything noteworthy, please add it to the timeline of Wikipedia that we're building at the 10th anniversary wiki,[1] as well as the other tools for cataloging interesting tidbits from our history.[2]
On Tue, Dec 14, 2010 at 8:11 AM, Chad innocentkiller@gmail.com wrote:
On Tue, Dec 14, 2010 at 10:54 AM, Tim Starling tstarling@wikimedia.org wrote:
I was looking through some old files in our SourceForge project. I opened a file called wiki.tar.gz, and inside were three complete backups of the text of Wikipedia, from February, March and August 2001!
This is exciting, because there is lots of article history in here which was assumed to be lost forever.
I've long been interested in Wikipedia's history, and I've tried in the past to locate such backups. I asked various people who might have had one. I had given up hope.
The history of particularly old Wikipedia articles, as seen in the present Wikipedia database, is incomplete, due to Usemod's policy of deleting old revisions of pages after about a month. The script which Brion wrote to import the article histories from UseMod to MediaWiki only fetched those revisions which hadn't been purged yet.
I didn't want to believe that those revisions had been lost forever, and I even opened the UseMod source code and stared forlornly at the unlink() call. What I (and Brion before) missed is that UseMod appends a record of every change made to two files, called diff_log and rclog. In these two files is a record of every change made to Wikipedia from January 15 to August 17, 2001.
I've put the two log files up on the web, at:
http://noc.wikimedia.org/~tstarling/wikipedia-logs-2001-08-17.7zhttp://noc.wikimedia.org/%7Etstarling/wikipedia-logs-2001-08-17.7z
The 7-zip archive is only 8.4MB -- much more manageable than today's backups.
rclog contains IP addresses. The Usemod software made IP addresses of logged-in users public, so the people who made these edits had no expectation that their IP address would be kept private. That, coupled with the passage of time, makes me think that no harm to user privacy can come from releasing these files.
-- Tim Starling
I have to say this is super cool. It's like digging up a time capsule right before the 10th anniversary. One of my favorite early edits:
"This is the new WikiPedia! The idea here is to write a complete encyclopedia from scratch, without peer review process, etc. Some people think that this may be a hopeless endeavor, that the result will necessarily suck. We aren't so sure. So, let's get to work!"
-Chad
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l