Brion Vibber wrote:
Nick Jenkins wrote:
29,000 pages (113.361/sec), 29,000 revs (113.361/sec) ERROR 1062 at line 459: Duplicate entry '0-1_E0_m?' for key 1
Looking in the database now, I see three pages with similar titles in that range:
id ns title 35982 0 1_E0_m 36017 0 1_E0_m² 36019 0 1_E0_m³
None of them should conflict, being quite distinct, which makes me suspect garbled input or output, or a garbled index configuration on MySQL.
I can confirm that I can import the first 50k pages or so of this dump without the reported problem ocurring. I'll run the rest when it's done downloading.
* Ubuntu Linux (Breezy Badger, x86) * en_US.UTF-8 locale * MySQL 4.0.24 * table definitions from MediaWiki 1.4.11 * mwdumper current CVS (shouldn't be any different in this regard from the last uploaded snapshot) * Sun J2SE 1.5.0_05-b05
On some quick testing it looks like there are some encoding problems if UTF-8 isn't the locale charset; I'll try and get those worked out.
In the meantime, try setting LANG=en_US.UTF-8 and rerunning it.
-- brion vibber (brion @ pobox.com)