Hi all,
Problems with mwdumper
Mwdumper (http://www.mediawiki.org/wiki/Mwdumper) crashes (around 35000 pages) when processing the en-WP dump as of 2007-05-27, with the following error:
root@xubuntu-svn:/home/admin/Desktop# jdk1.5.0_12/bin/java -jar mwdumper.jar --format=sql:1.5 enwp-200707 > enwp-200707.sql ... 32,000 pages (373.893/sec), 32,000 revs (373.893/sec) 33,000 pages (373.206/sec), 33,000 revs (373.206/sec) 34,000 pages (377.979/sec), 34,000 revs (377.979/sec) 35,000 pages (377.851/sec), 35,000 revs (377.851/sec) Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 2048 at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source) at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source) at org.apache.xerces.impl.XMLEntityScanner.skipChar(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$Frag mentContentDispatcher.dispatch(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scan Document(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source) at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source) at javax.xml.parsers.SAXParser.parse(SAXParser.java:375) at javax.xml.parsers.SAXParser.parse(SAXParser.java:176) at org.mediawiki.importer.XmlDumpReader.readDump(Unknown Source) at org.mediawiki.dumper.Dumper.main(Unknown Source) root@xubuntu:/home/admin/Desktop#
More info about the environment:
Java version: root@xubuntu:/home/admin/Desktop# sudo ./jdk1.5.0_12/bin/java -version java version "1.5.0_12" Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_12-b04) Java HotSpot(TM) Client VM (build 1.5.0_12-b04, mixed mode, sharing)
OS: GNU/Linux Xubuntu 6.10 Kernel release: 2.6.17-10-generic, Kernel version: #2 SMP Fri Oct 13 18:45:35 UTC 2006
Any ideas anyone?
Regards,
// Rolf Lampa