I am having problem with one of the files. Can anyone verify if there is a problem with the file or mwdumper?
I am using a freshly built version from git (just built it). Here is the log:
$ 7za e -so enwiki-20130503-pages-meta-history1.xml-p000006887p000009316.7z |java -server -jar mwdumper-1.16.jar --format=sql:1.5 | gzip -vc > temp.sql.gz
7-Zip (A) 9.04 beta Copyright (c) 1999-2009 Igor Pavlov 2009-05-30 p7zip Version 9.04 (locale=en_US.ISO-8859-15,Utf16=on,HugeFiles=on,8 CPUs)
Processing archive: enwiki-20130503-pages-meta-history1.xml-p000006887p000009316.7z
Extracting enwiki-20130503-pages-meta-history1.xml-p000006887p0000093163 pages (1.165/sec), 1,000 revs (388.35/sec) 3 pages (0.356/sec), 2,000 revs (237.164/sec) 8 pages (0.677/sec), 3,000 revs (253.807/sec) 13 pages (1.058/sec), 4,000 revs (325.627/sec) 13 pages (0.992/sec), 5,000 revs (381.505/sec) 16 pages (1.169/sec), 6,000 revs (438.436/sec) 16 pages (1.016/sec), 7,000 revs (444.501/sec) 17 pages (0.854/sec), 8,000 revs (401.849/sec) 17 pages (0.695/sec), 9,000 revs (367.752/sec) 18 pages (0.675/sec), 10,000 revs (374.967/sec) 18 pages (0.653/sec), 11,000 revs (399.332/sec) 18 pages (0.626/sec), 12,000 revs (417.043/sec) 18 pages (0.6/sec), 13,000 revs (433.117/sec) 18 pages (0.555/sec), 14,000 revs (431.766/sec) 18 pages (0.499/sec), 15,000 revs (416.17/sec) 19 pages (0.509/sec), 16,000 revs (428.483/sec) 22 pages (0.58/sec), 17,000 revs (448.43/sec) 22 pages (0.571/sec), 18,000 revs (467.302/sec) 23 pages (0.546/sec), 19,000 revs (450.835/sec) 24 pages (0.564/sec), 20,000 revs (469.649/sec) 26 pages (0.587/sec), 21,000 revs (473.912/sec) 28 pages (0.623/sec), 22,000 revs (489.182/sec) 31 pages (0.684/sec), 23,000 revs (507.469/sec) 31 pages (0.647/sec), 24,000 revs (500.584/sec) 33 pages (0.655/sec), 25,000 revs (495.835/sec) Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 2048 at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source) at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source) at org.apache.xerces.impl.XMLEntityScanner.scanContent(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanContent(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source) at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source) at javax.xml.parsers.SAXParser.parse(SAXParser.java:392) at javax.xml.parsers.SAXParser.parse(SAXParser.java:195) at org.mediawiki.importer.XmlDumpReader.readDump(XmlDumpReader.java:88) at org.mediawiki.dumper.Dumper.main(Dumper.java:142) 77.4%
Michael
On Fri, May 17, 2013 at 3:57 PM, Michael Tsikerdekis tsikerdekis@gmail.comwrote:
Great that should work just fine! The pages-meta-history are the files that I want although I modified the text blob columns to varchar since I really don't need this data to be restored and they tend to be the largest.
Thank you both for your help!
Michael
On Fri, May 17, 2013 at 11:57 AM, Chad innocentkiller@gmail.com wrote:
On Fri, May 17, 2013 at 1:10 AM, Ariel T. Glenn ariel@wikimedia.org wrote:
PS: I've also tried to build mwdumper: git clone
https://gerrit.wikimedia.org/r/p/mediawiki/tools/mwdumper.git
mwdumper
However I couldn't use make or ant since there was not build.xml or makefile in the git.
You can backtrack a couple revisions in git to get one that's buildable. I'm cc-ing Chad on this since he knows about the build setup.
That would be because we swapped out Ant in favor of Maven a little while back. `mvn package` should work just fine.
-Chad
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-l