Thanks Ariel.
I rerun the code with progress=1 and here are the final lines:
33 pages (0.594/sec), 25,370 revs (456.336/sec) 33 pages (0.594/sec), 25,371 revs (456.329/sec) 33 pages (0.594/sec), 25,372 revs (456.315/sec) 33 pages (0.593/sec), 25,373 revs (455.718/sec) 33 pages (0.593/sec), 25,374 revs (455.695/sec) Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 2048 at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source) at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source) at org.apache.xerces.impl.XMLEntityScanner.scanContent(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanContent(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source) at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source) at javax.xml.parsers.SAXParser.parse(SAXParser.java:392) at javax.xml.parsers.SAXParser.parse(SAXParser.java:195) at org.mediawiki.importer.XmlDumpReader.readDump(XmlDumpReader.java:88) at org.mediawiki.dumper.Dumper.main(Dumper.java:142) 77.4%
Michael
On Mon, May 20, 2013 at 1:09 PM, Ariel T. Glenn ariel@wikimedia.org wrote:
Στις 19-05-2013, ημέρα Κυρ, και ώρα 23:43 +0200, ο/η Michael Tsikerdekis έγραψε:
$ 7za e -so
enwiki-20130503-pages-meta-history1.xml-p000006887p000009316.7z
|java -server -jar mwdumper-1.16.jar --format=sql:1.5 | gzip -vc > temp.sql.gz
<snip>
31 pages (0.647/sec), 24,000 revs (500.584/sec) 33 pages (0.655/sec), 25,000 revs (495.835/sec) Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 2048 at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source) at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source) at org.apache.xerces.impl.XMLEntityScanner.scanContent(Unknown Source)
Can you please rerun mwdumper with the additional argument --progress=1 which should tell us the exact number of revisions processed before it dies?
Thanks,
Ariel
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-l