Jochen Magnus wrote:
I think that not only en dumps failed. I found some errors in the most recent de dump (20051017_pages_articles.xml): As far as I inspected the xml file manually, I saw several <title>'s which do not belong to the comtent. I.e.:
<title>Vitruvius</title> contains an article about the planet Venus <title>Indianische Flöte</title> contains the history of Poland <title>Marlon Brando</title> contains Madonna (sic!) and so on
Hmm, that shouldn't happen. I'll have to debug it. Sigh.
Ok, confirmed on dewiki dump; here's a fragment example: http://meta.wikimedia.org/wiki/User:Brion_VIBBER/Crap
Obviously there's a synchronization bug in my prefetch code. I'll try and debug this today or tonight and restart the dumps tonight/tomorrow.
Don't use any of the 20051017 dumps; they're all suspect.
-- brion vibber (brion @ pobox.com)