After running mwdumper to strip out the NS_MEDIAWIKI namespace entries from the 20070206 SQL dumps, the output filtered XML file created by mwdumper after stripping out the name space has sql syntax errors in the output:
1. The output files cannot be used by mwimport because the text labels for XML types, etc, are modified by mwdumper to the extent the program can no longer read the dump. 2. If you take the XML file output by mwdumper and attempt to reimport it into an empty database with mwdumper, it produces corrupted SQL statements and fails. It will process about 720,000 articles however, before failing. Output from mysql error log provided.
ERROR 1062 (23000) at line 15327: Duplicate entry '70473566' for key 1 ERROR 1064 (42000) at line 15328: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''== Greatest Common Factor / Least Common Multiple Problem ==\n\nI recently went' at line 1
Jeff
Jeffrey V. Merkey wrote:
After running mwdumper to strip out the NS_MEDIAWIKI namespace entries from the 20070206 SQL dumps, the output filtered XML file created by mwdumper after stripping out the name space has sql syntax errors in the output:
- The output files cannot be used by mwimport because the text
labels for XML types, etc, are modified by mwdumper to the extent the program can no longer read the dump. 2. If you take the XML file output by mwdumper and attempt to reimport it into an empty database with mwdumper, it produces corrupted SQL statements and fails. It will process about 720,000 articles however, before failing. Output from mysql error log provided.
ERROR 1062 (23000) at line 15327: Duplicate entry '70473566' for key 1 ERROR 1064 (42000) at line 15328: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''== Greatest Common Factor / Least Common Multiple Problem ==\n\nI recently went' at line 1
Jeff
NOTE: These errors occur with the standard English Dumps with the existing tools. These dumps are not the modified dumps created by the machine translator.
Jeff
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Jeffrey V. Merkey wrote:
After running mwdumper to strip out the NS_MEDIAWIKI namespace entries from the 20070206 SQL dumps, the output filtered XML file created by mwdumper after stripping out the name space has sql syntax errors in the output:
- The output files cannot be used by mwimport because the text labels
for XML types, etc, are modified by mwdumper to the extent the program can no longer read the dump.
If you want bug reports acted on, it is a requirement that you do the following:
1) Report them in our Bugzilla at bugzilla.wikimedia.org
2) Include all relevant information, specificially the exact command lines used and the exact files you used.
Bug reports which include no information on how to reproduce the bug are useless.
- -- brion vibber (brion @ wikimedia.org)
wikitech-l@lists.wikimedia.org