On Thursday, December 16, 2010, Federico Leva (Nemo) wrote:
I have the first 10K edits up reconstructed in their
various pages at:
http://cyber.law.harvard.edu/~reagle/wp-redux/
I fixed some of the encoding issues. The DB dump contained different encodings. So, the
encoding of each diff in the dump is independently now guessed using Python's CharDet
(Universal Encoding Detector) library.
So now you can read up on the few "accented" topics in the early Wikipedia
including: Göteborg, Köpenhamn, and Křbenhavn. (Nothing very exciting.) But it means
articles, such as ASCII, are much improved as well. Interestingly, the ASCII page
isn't about ASCII itself so much, but as to how to type non-ascii characters in the
early Wikipedia.
http://cyber.law.harvard.edu/~reagle/wp-redux/ASCII/983670583.html