[WikiEN-l] [Foundation-l] Old Wikipedia backups discovered

Joseph Reagle joseph.2008 at reagle.org
Fri Dec 17 20:18:45 UTC 2010


On Thursday, December 16, 2010, Federico Leva (Nemo) wrote:
> I have the first 10K edits up reconstructed in their various pages at:
>    http://cyber.law.harvard.edu/~reagle/wp-redux/

I fixed some of the encoding issues. The DB dump contained different encodings. So, the encoding of each diff in the dump is independently now guessed using Python's CharDet (Universal Encoding Detector) library.

So now you can read up on the few "accented" topics in the early Wikipedia including: Göteborg, Köpenhamn, and Křbenhavn. (Nothing very exciting.) But it means articles, such as ASCII, are much improved as well. Interestingly, the ASCII page isn't about ASCII itself so much, but as to how to type non-ascii characters in the early Wikipedia.

  http://cyber.law.harvard.edu/~reagle/wp-redux/ASCII/983670583.html



More information about the WikiEN-l mailing list