[WikiEN-l] [Foundation-l] Old Wikipedia backups discovered
Joseph Reagle
joseph.2008 at reagle.org
Fri Dec 17 20:18:45 UTC 2010
On Thursday, December 16, 2010, Federico Leva (Nemo) wrote:
> I have the first 10K edits up reconstructed in their various pages at:
> http://cyber.law.harvard.edu/~reagle/wp-redux/
I fixed some of the encoding issues. The DB dump contained different encodings. So, the encoding of each diff in the dump is independently now guessed using Python's CharDet (Universal Encoding Detector) library.
So now you can read up on the few "accented" topics in the early Wikipedia including: Göteborg, Köpenhamn, and Křbenhavn. (Nothing very exciting.) But it means articles, such as ASCII, are much improved as well. Interestingly, the ASCII page isn't about ASCII itself so much, but as to how to type non-ascii characters in the early Wikipedia.
http://cyber.law.harvard.edu/~reagle/wp-redux/ASCII/983670583.html
More information about the WikiEN-l
mailing list