Several en dumps have failed due to the backup.lock file already existing.
Just FYI...
Jeremy Dunck wrote:
Several en dumps have failed due to the backup.lock file already existing.
That's because I run it on a separate pass from the others due to its size; the main pass sees the lock when it gets to enwiki and skips over it.
Now, *those* dumps failing, if they do, are for entirely different reasons. :P
-- brion vibber (brion @ pobox.com)
Brion,
I think that not only en dumps failed. I found some errors in the most recent de dump (20051017_pages_articles.xml): As far as I inspected the xml file manually, I saw several <title>'s which do not belong to the comtent. I.e.:
<title>Vitruvius</title> contains an article about the planet Venus <title>Indianische Flöte</title> contains the history of Poland <title>Marlon Brando</title> contains Madonna (sic!) and so on
Besides, there a many articles which exceptionally length in the dump, which are not belonging into Namespace #0.
Cheers
jo
Several en dumps have failed due to the backup.lock file already existing.
Jochen Magnus wrote:
Brion,
I think that not only en dumps failed. I found some errors in the most recent de dump (20051017_pages_articles.xml): As far as I inspected the xml file manually, I saw several <title>'s which do not belong to the comtent. I.e.:
<title>Vitruvius</title> contains an article about the planet Venus <title>Indianische Flöte</title> contains the history of Poland <title>Marlon Brando</title> contains Madonna (sic!) and so on
Hmm, that shouldn't happen. I'll have to debug it. Sigh.
Besides, there a many articles which exceptionally length in the dump, which are not belonging into Namespace #0.
?
-- brion vibber (brion @ pobox.com)
Jochen Magnus wrote:
I think that not only en dumps failed. I found some errors in the most recent de dump (20051017_pages_articles.xml): As far as I inspected the xml file manually, I saw several <title>'s which do not belong to the comtent. I.e.:
<title>Vitruvius</title> contains an article about the planet Venus <title>Indianische Flöte</title> contains the history of Poland <title>Marlon Brando</title> contains Madonna (sic!) and so on
Hmm, that shouldn't happen. I'll have to debug it. Sigh.
Ok, confirmed on dewiki dump; here's a fragment example: http://meta.wikimedia.org/wiki/User:Brion_VIBBER/Crap
Obviously there's a synchronization bug in my prefetch code. I'll try and debug this today or tonight and restart the dumps tonight/tomorrow.
Don't use any of the 20051017 dumps; they're all suspect.
-- brion vibber (brion @ pobox.com)
2005/10/17, Brion Vibber brion@pobox.com:
Obviously there's a synchronization bug in my prefetch code. I'll try and debug this today or tonight and restart the dumps tonight/tomorrow.
Don't use any of the 20051017 dumps; they're all suspect.
Then please delete them.
Folke Behrens wrote:
2005/10/17, Brion Vibber brion@pobox.com:
Don't use any of the 20051017 dumps; they're all suspect.
Then please delete them.
I want to keep them around for bug comparison at the moment, but I have gone ahead and renamed them with a ".broken" extension which I hope will discourage people from downloading them. ;)
-- brion vibber (brion @ pobox.com)
Brion,
the latest XML dumps seems to be fine. I tested the de dump from 2005-10-20: No problems while importing it into MySQL and indexing it with ioda.
Thank you!
jo
Jochen Magnus wrote:
Brion,
the latest XML dumps seems to be fine. I tested the de dump from 2005-10-20: No problems while importing it into MySQL and indexing it with ioda.
Yay, something worked! ;)
I forgot to include the detail in my prior message, that some of the older dumps may contain duplicate page titles due to the existence of old non-normalized Unicode titles in the databases; that can lead to 'existing key' errors on the unique title indexes when importing with mwdumper.
Those have been purged, and so it shouldn't happen anymore.
I am interested in details of failure reports with importDump.php; the prior memory leaks with 1.5 prereleases should be fixed as of 1.5.0 but there might be intermittent or specific problems. Unfortunately importDump is relatively slow because it works page-by-page (mwdumper does faster bulk imports, assuming a blank slate empty database to start with) so it takes a while to test or confirm failures. :)
-- brion vibber (brion @ pobox.com)
On Fri, 21 Oct 2005, Brion Vibber wrote:
I am interested in details of failure reports with importDump.php; the prior memory leaks with 1.5 prereleases should be fixed as of 1.5.0 but there might be intermittent or specific problems. Unfortunately importDump is relatively slow because it works page-by-page (mwdumper does faster bulk imports, assuming a blank slate empty database to start with) so it takes a while to test or confirm failures. :)
Test of importDump.php with the it: current page xml (20051012_pages_current.xml.7z) went OK.
No apparent memory leaks, memory usage hovered around 17-18 MB.
Alfio
wikitech-l@lists.wikimedia.org