[WikiEN-l] [Foundation-l] Old Wikipedia backups discovered

Martin Møller Skarbiniks Pedersen traxplayer at gmail.com
Sun Dec 19 06:11:06 UTC 2010


On 17 December 2010 21:18, Joseph Reagle <joseph.2008 at reagle.org> wrote:
> On Thursday, December 16, 2010, Federico Leva (Nemo) wrote:
>> I have the first 10K edits up reconstructed in their various pages at:
>>    http://cyber.law.harvard.edu/~reagle/wp-redux/
>
> I fixed some of the encoding issues. The DB dump contained different encodings. So, the encoding of each diff in the dump is independently now guessed using Python's CharDet (Universal Encoding Detector) library.
>
> So now you can read up on the few "accented" topics in the early Wikipedia including: Göteborg, Köpenhamn, and Křbenhavn.

Should probably be København and not Křbenhavn

/Martin



More information about the WikiEN-l mailing list