Hi,
I am using mwdumper.jar to convert the dump into sql using the following
command on Ubuntu 7.10 with Java 1.5.0_13
<mediawiki-l(a)lists.wikimedia.org>nohup java -jar mwdumper.jar
--format=sql:1.5 enwiki-latest-pages-articles.xml.bz2
--filter=titlematch:[bB].* > b.sql 2>mwdumper.log2 &
and I am getting the following error
4,727,000 pages (1,685.36/sec), 4,727,000 revs (1,685.36/sec)
4,728,000 pages (1,685.5/sec), 4,728,000 revs (1,685.5/sec)
4,729,000 pages (1,685.604/sec), 4,729,000 revs (1,685.604/sec)
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 2048
at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source)
at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source)
at org.apache.xerces.impl.XMLEntityScanner.scanContent(Unknown
Source)
at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanContent(Unknown
Source)
at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
Source)
at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown
Source)
at javax.xml.parsers.SAXParser.parse(SAXParser.java:375)
at javax.xml.parsers.SAXParser.parse(SAXParser.java:176)
at org.mediawiki.importer.XmlDumpReader.readDump(Unknown Source)
at org.mediawiki.dumper.Dumper.main(Unknown Source)
Any idea, anyone? What's going on?
I have checked this thread but of no use --
https://lists.wikimedia.org/mailman/htdig/mediawiki-l/2007-July/021537.html
md5sum of the downloaded dump file is correct. Can someone please help me
out with this? Was anyone able to successfully import the latest dump
(20080312)?
Thanks,
Nazeer