When I run mwdumper with enwiki-20061001-pages-articles.xml.bz2 and pipe it to mysql (see my other mail), it eventually prints out something like 3,583,699 pages (490.212/sec), 3,583,699 revs (490.212/sec) and exits.
However, on the database afterwards a query like SELECT COUNT(*) FROM page gives me a result count somewhere around 2.5 million (I don't have the exact number in my notes, sorry). And this discrepancy is further confirmed by many missing articles: lots of red links from existing articles where I can confirm that the article doesn't exist in the database but does exist in the dump.
Is mwdumper known to have problems in this area? How might I track this down?
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Evan Martin wrote:
When I run mwdumper with enwiki-20061001-pages-articles.xml.bz2 and pipe it to mysql (see my other mail), it eventually prints out something like 3,583,699 pages (490.212/sec), 3,583,699 revs (490.212/sec) and exits.
However, on the database afterwards a query like SELECT COUNT(*) FROM page gives me a result count somewhere around 2.5 million (I don't have the exact number in my notes, sorry).
This means MySQL encountered an error and stopped executing further statements from input.
Be sure to watch the output of MySQL for errors, not just the output of mwdumper... you may find it easier if you send the SQL output to an intermediate file first, or else redirect each program's error output to separate files (which you can tail in separate terminals and/or look at later).
- -- brion vibber (brion @ pobox.com)
Brion Vibber wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Evan Martin wrote:
When I run mwdumper with enwiki-20061001-pages-articles.xml.bz2 and pipe it to mysql (see my other mail), it eventually prints out something like 3,583,699 pages (490.212/sec), 3,583,699 revs (490.212/sec) and exits.
However, on the database afterwards a query like SELECT COUNT(*) FROM page gives me a result count somewhere around 2.5 million (I don't have the exact number in my notes, sorry).
This means MySQL encountered an error and stopped executing further statements from input.
Be sure to watch the output of MySQL for errors, not just the output of mwdumper... you may find it easier if you send the SQL output to an intermediate file first, or else redirect each program's error output to separate files (which you can tail in separate terminals and/or look at later).
- -- brion vibber (brion @ pobox.com)
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFFdP/awRnhpk1wk44RAhf1AKDe7vwpTyh2c4XpELGeaS3whBh/zwCfbqdS Shosm2/hXgo32BfQkKT6zhQ= =N4Di -----END PGP SIGNATURE----- _______________________________________________ Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Brion,
When I use mwdumper, it says its dumping to the db, but none of the articles show up in the new dump or the database (but the innodb gets bigger). importDump works, but is slow as hell.
Jeff
On 12/4/06, Brion Vibber brion@pobox.com wrote:
Evan Martin wrote:
When I run mwdumper with enwiki-20061001-pages-articles.xml.bz2 and pipe it to mysql (see my other mail), it eventually prints out something like 3,583,699 pages (490.212/sec), 3,583,699 revs (490.212/sec) and exits.
However, on the database afterwards a query like SELECT COUNT(*) FROM page gives me a result count somewhere around 2.5 million (I don't have the exact number in my notes, sorry).
This means MySQL encountered an error and stopped executing further statements from input.
Be sure to watch the output of MySQL for errors, not just the output of mwdumper... you may find it easier if you send the SQL output to an intermediate file first, or else redirect each program's error output to separate files (which you can tail in separate terminals and/or look at later).
I had thought that both MySQL and mwdumper would output to the console, but I guess it must've output this error many lines back without killing the pipe, which meant mwdumper kept going. Redirecting stderr produced ERROR 1114 (HY000) at line 19339: The table 'text' is full which I know how to fix.
Thanks again.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Evan Martin wrote:
I had thought that both MySQL and mwdumper would output to the console, but I guess it must've output this error many lines back without killing the pipe, which meant mwdumper kept going.
Yeah, the cute thing is that mysql keeps accepting input -- it just doesn't execute any more of it. So mwdumper keeps trundling along, spewing more status updates on your console.
"Thanks, MySQL!" :)
- -- brion vibber (brion @ pobox.com)
My experience : - don't use the java direct connection : do a pipe with mysql - increase the sql request buffer size
2006/12/5, Evan Martin evanm@google.com:
When I run mwdumper with enwiki-20061001-pages-articles.xml.bz2 and pipe it to mysql (see my other mail), it eventually prints out something like 3,583,699 pages (490.212/sec), 3,583,699 revs (490.212/sec) and exits.
However, on the database afterwards a query like SELECT COUNT(*) FROM page gives me a result count somewhere around 2.5 million (I don't have the exact number in my notes, sorry). And this discrepancy is further confirmed by many missing articles: lots of red links from existing articles where I can confirm that the article doesn't exist in the database but does exist in the dump.
Is mwdumper known to have problems in this area? How might I track this down? _______________________________________________ Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
On 05/12/06, Emmanuel Engelhart emmanuel@engelhart.org wrote:
My experience :
- don't use the java direct connection : do a pipe with mysql
- increase the sql request buffer size
There are some other tips like this on Meta, I seem to recall; stuff like disabling the binary log if it's not needed, and increasing various buffers and limits...
Rob Church
wikitech-l@lists.wikimedia.org