[Foundation-l] Old Wikipedia backups discovered
emijrp
emijrp at gmail.com
Tue Dec 14 17:17:22 UTC 2010
Hi;
Thanks Tim. Congratulations.
Is Wikipedia:UuU[1] now out-of-date?
Regards,
emijrp
[1] http://en.wikipedia.org/wiki/Wikipedia:UuU
2010/12/14 Tim Starling <tstarling at wikimedia.org>
> I was looking through some old files in our SourceForge project. I
> opened a file called wiki.tar.gz, and inside were three complete
> backups of the text of Wikipedia, from February, March and August 2001!
>
> This is exciting, because there is lots of article history in here
> which was assumed to be lost forever.
>
> I've long been interested in Wikipedia's history, and I've tried in
> the past to locate such backups. I asked various people who might have
> had one. I had given up hope.
>
> The history of particularly old Wikipedia articles, as seen in the
> present Wikipedia database, is incomplete, due to Usemod's policy of
> deleting old revisions of pages after about a month. The script which
> Brion wrote to import the article histories from UseMod to MediaWiki
> only fetched those revisions which hadn't been purged yet.
>
> I didn't want to believe that those revisions had been lost forever,
> and I even opened the UseMod source code and stared forlornly at the
> unlink() call. What I (and Brion before) missed is that UseMod appends
> a record of every change made to two files, called diff_log and rclog.
> In these two files is a record of every change made to Wikipedia from
> January 15 to August 17, 2001.
>
> I've put the two log files up on the web, at:
>
> http://noc.wikimedia.org/~tstarling/wikipedia-logs-2001-08-17.7z<http://noc.wikimedia.org/%7Etstarling/wikipedia-logs-2001-08-17.7z>
>
> The 7-zip archive is only 8.4MB -- much more manageable than today's
> backups.
>
> rclog contains IP addresses. The Usemod software made IP addresses of
> logged-in users public, so the people who made these edits had no
> expectation that their IP address would be kept private. That, coupled
> with the passage of time, makes me think that no harm to user privacy
> can come from releasing these files.
>
> -- Tim Starling
>
> _______________________________________________
> foundation-l mailing list
> foundation-l at lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>
More information about the wikimedia-l
mailing list