After much work, I managed to get the Wikipedia data from
pages-meta-current.xml.bz2
<http://download.wikimedia.org/enwiki/20071018/enwiki-20071018-pages-met
a-current.xml.bz2> into MySQL
After 1,328,000 I got the following error:
Exception in thread "main" java.io.IOException: An invalid XML character
(Unicode: 0x2) was found in the element content of the document.
at org.mediawiki.importer.XmlDumpReader.readDump(Unknown Source)
at org.mediawiki.dumper.Dumper.main(Unknown Source)
What do I do with this error? Can I get it to continue some how?
P Please consider the environment before printing this e-mail
Show replies by date