[Foundation-l] Old Wikipedia backups discovered
ft2.wiki at gmail.com
Tue Dec 14 19:31:52 UTC 2010
Would prefer on its own wiki as this is comprehensive up to a given date.
Maybe January2001.wikipedia.org -- immediate impact.
(DNS software cannot handle 2001.wikipedia.org)
On Tue, Dec 14, 2010 at 6:04 PM, phoebe ayers <phoebe.wiki at gmail.com> wrote:
> On Tue, Dec 14, 2010 at 7:54 AM, Tim Starling <tstarling at wikimedia.org>
> > I was looking through some old files in our SourceForge project. I
> > opened a file called wiki.tar.gz, and inside were three complete
> > backups of the text of Wikipedia, from February, March and August 2001!
> > This is exciting, because there is lots of article history in here
> > which was assumed to be lost forever.
> > I've long been interested in Wikipedia's history, and I've tried in
> > the past to locate such backups. I asked various people who might have
> > had one. I had given up hope.
> > The history of particularly old Wikipedia articles, as seen in the
> > present Wikipedia database, is incomplete, due to Usemod's policy of
> > deleting old revisions of pages after about a month. The script which
> > Brion wrote to import the article histories from UseMod to MediaWiki
> > only fetched those revisions which hadn't been purged yet.
> > I didn't want to believe that those revisions had been lost forever,
> > and I even opened the UseMod source code and stared forlornly at the
> > unlink() call. What I (and Brion before) missed is that UseMod appends
> > a record of every change made to two files, called diff_log and rclog.
> > In these two files is a record of every change made to Wikipedia from
> > January 15 to August 17, 2001.
> > I've put the two log files up on the web, at:
> > http://noc.wikimedia.org/~tstarling/wikipedia-logs-2001-08-17.7z
> > The 7-zip archive is only 8.4MB -- much more manageable than today's
> > backups.
> > rclog contains IP addresses. The Usemod software made IP addresses of
> > logged-in users public, so the people who made these edits had no
> > expectation that their IP address would be kept private. That, coupled
> > with the passage of time, makes me think that no harm to user privacy
> > can come from releasing these files.
> > -- Tim Starling
> AWESOME. This is so cool. I've copied the research list too, since
> there's many Wikipedia historians that will be eager to see the older
> I hope we can get them up in a browsable way, like nostalgia.wikipedia.org
> -- phoebe
> foundation-l mailing list
> foundation-l at lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
More information about the wikimedia-l