On Tuesday, December 14, 2010, Tim Starling wrote:
I didn't want to believe that those revisions had
been lost forever,
and I even opened the UseMod source code and stared forlornly at the
unlink() call. What I (and Brion before) missed is that UseMod appends
a record of every change made to two files, called diff_log and rclog.
In these two files is a record of every change made to Wikipedia from
January 15 to August 17, 2001.
Unfortunately, it doesn't look like versions of the articles beyond the first ~10 are
automatically recoverable. I wrote a Python script to reconstruct the early WP, but it
fails because of apparent weaknesses in "normal diffs", which is what UseMod
apparently uses. To reconstruct any particular version in time, I iteratively apply all
diffs via `patch` up to that point. It doesn't take long before patch chokes on a
diff. In fact, I've discovered there are simple cases in which normal_diff/patch are
incapable of round tripping.
I hope someone will eventually prove me wrong, or some log is found that is actually
capable of recreating the state. (I wonder what the point of providing a diff_log export
is if it isn't useable, and perhaps UseMod folks could speak to that.)