[Mediawiki-l] ArrayIndexOutOfBoundsException in mwdumper

Nazeer Hussain snazeerhussain at gmail.com
Sat Apr 12 19:22:34 UTC 2008


Hi,

I am using mwdumper.jar to convert the dump into sql using the following
command on Ubuntu 7.10 with Java 1.5.0_13

<mediawiki-l at lists.wikimedia.org>nohup java -jar mwdumper.jar
--format=sql:1.5 enwiki-latest-pages-articles.xml.bz2
--filter=titlematch:[bB].* > b.sql 2>mwdumper.log2 &

and I am getting the following error

4,727,000 pages (1,685.36/sec), 4,727,000 revs (1,685.36/sec)
4,728,000 pages (1,685.5/sec), 4,728,000 revs (1,685.5/sec)
4,729,000 pages (1,685.604/sec), 4,729,000 revs (1,685.604/sec)
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 2048
        at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source)
        at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source)
        at org.apache.xerces.impl.XMLEntityScanner.scanContent(Unknown
Source)
        at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanContent(Unknown
Source)
        at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
Source)
        at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
Source)
        at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
        at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown
Source)
        at javax.xml.parsers.SAXParser.parse(SAXParser.java:375)
        at javax.xml.parsers.SAXParser.parse(SAXParser.java:176)
        at org.mediawiki.importer.XmlDumpReader.readDump(Unknown Source)
        at org.mediawiki.dumper.Dumper.main(Unknown Source)


Any idea, anyone? What's going on?

I have checked this thread but of no use --
https://lists.wikimedia.org/mailman/htdig/mediawiki-l/2007-July/021537.html

md5sum of the downloaded dump file is correct. Can someone please help me
out with this? Was anyone able to successfully import the latest dump
(20080312)?

Thanks,
Nazeer


More information about the MediaWiki-l mailing list