Thanks Ariel.
I rerun the code with progress=1 and here are the final lines:
33 pages (0.594/sec), 25,370 revs (456.336/sec)
33 pages (0.594/sec), 25,371 revs (456.329/sec)
33 pages (0.594/sec), 25,372 revs (456.315/sec)
33 pages (0.593/sec), 25,373 revs (455.718/sec)
33 pages (0.593/sec), 25,374 revs (455.695/sec)
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 2048
at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source)
at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source)
at org.apache.xerces.impl.XMLEntityScanner.scanContent(Unknown
Source)
at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanContent(Unknown
Source)
at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
Source)
at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown
Source)
at javax.xml.parsers.SAXParser.parse(SAXParser.java:392)
at javax.xml.parsers.SAXParser.parse(SAXParser.java:195)
at
org.mediawiki.importer.XmlDumpReader.readDump(XmlDumpReader.java:88)
at org.mediawiki.dumper.Dumper.main(Dumper.java:142)
77.4%
Michael
On Mon, May 20, 2013 at 1:09 PM, Ariel T. Glenn <ariel(a)wikimedia.org> wrote:
Στις 19-05-2013, ημέρα Κυρ, και ώρα 23:43 +0200, ο/η
Michael Tsikerdekis
έγραψε:
$ 7za e -so
enwiki-20130503-pages-meta-history1.xml-p000006887p000009316.7z
|java -server -jar mwdumper-1.16.jar
--format=sql:1.5 | gzip -vc >
temp.sql.gz
<snip>
31 pages (0.647/sec), 24,000 revs (500.584/sec)
33 pages (0.655/sec), 25,000 revs (495.835/sec)
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 2048
at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source)
at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source)
at org.apache.xerces.impl.XMLEntityScanner.scanContent(Unknown Source)
Can you please rerun mwdumper with the additional argument
--progress=1
which should tell us the exact number of revisions processed before it
dies?
Thanks,
Ariel
_______________________________________________
MediaWiki-l mailing list
MediaWiki-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l