Jeffrey V. Merkey wrote:
The default in 1.9.x is:
$wgLegalTitleChars = " %!"$&'()*,\-.\/0-9:;=?@A-Z\\^_`a-z~\x80-\xFF+";
This includes all multibyte characters due to the \x80-\xFF range near the end, including your example "²". The value used on Wikimedia is identical to the default except for the order of characters in the class:
$wgLegalTitleChars = "+ %!"$&'()*,\-.\/0-9:;=?@A-Z\\^_`a-z~\x80-\xFF";
Did you perhaps accidentally remove the \x80-\xFF range at some stage?
No, I did not remove it. I am re-running importDump with the debug logging enabled and a debugger. It appears the problem is more involved than previously reported, which is why I delayed on updating the fix to the data dumps page on meta. I am re-reunning the program to debug further, modifying the title chars fixed one title only for another to crash further down in the dump. It takes several hours to get to the point in the dump I am seeing the corruption and error, Should crash in another 30 minutes or so again so I can post morten it again.
Jeff
Confirmed precise location of the failure. The number on the left hnd side is the article number.
2698244:Dog adenovirus 2698245:David A. Caputo 2698246:Famous Detective Conan (Case Closed 2698247:William Hughes Mulligan
THIS TITLE PRODUCES THE IMPORT DUMP FAILURE.
2698248:Wikipedia:Articles for deletion/Wikipedia:Articles for deletion/Wikipedia:Articles for deletion/Wikipedia:Articles for deletion/Wikipedia:Articles for deletion/Wikipedia:Articles for deletion/Wikipedia:Articles for deletion/Greatest Hits Volume One (The Byrds)
2698249:Image:Tripitaka storage2.jpg 2698250:Natalie Golda 2698251:Image:Big Passage outside Ampleforth College Library.jpg 2698252:Canine infectious hepatitis 2698253:P²-irreducible 2698254:Image:Zebra sideview.jpg 2698255:Amy Freed
Jeff