Hi,
I am using mwdumper.jar to convert the dump into sql using the following command on Ubuntu 7.10 with Java 1.5.0_13
mediawiki-l@lists.wikimedia.orgnohup java -jar mwdumper.jar --format=sql:1.5 enwiki-latest-pages-articles.xml.bz2 --filter=titlematch:[bB].* > b.sql 2>mwdumper.log2 &
and I am getting the following error
4,727,000 pages (1,685.36/sec), 4,727,000 revs (1,685.36/sec) 4,728,000 pages (1,685.5/sec), 4,728,000 revs (1,685.5/sec) 4,729,000 pages (1,685.604/sec), 4,729,000 revs (1,685.604/sec) Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 2048 at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source) at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source) at org.apache.xerces.impl.XMLEntityScanner.scanContent(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanContent(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source) at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source) at javax.xml.parsers.SAXParser.parse(SAXParser.java:375) at javax.xml.parsers.SAXParser.parse(SAXParser.java:176) at org.mediawiki.importer.XmlDumpReader.readDump(Unknown Source) at org.mediawiki.dumper.Dumper.main(Unknown Source)
Any idea, anyone? What's going on?
I have checked this thread but of no use -- https://lists.wikimedia.org/mailman/htdig/mediawiki-l/2007-July/021537.html
md5sum of the downloaded dump file is correct. Can someone please help me out with this? Was anyone able to successfully import the latest dump (20080312)?
Thanks, Nazeer