I'm getting the following error while using the mwdumper from 2006-Feb-01.
tail load-progress-err
7 pages (1.396/sec), 1,000 revs (199.362/sec)
Exception in thread "main" java.io.IOException: XML document
structures must start and end within the same entity.
at org.mediawiki.importer.XmlDumpReader.readDump(Unknown Source)
at org.mediawiki.dumper.Dumper.main(Unknown Source)
I can't figure out what might be wrong w/ the XML being input, and
since the error doesn't give an offset in the stream, I'm not sure how
to troubleshoot it.
Here's the cmd line:
bzcat enwiki-20060125-pages-meta-history.xml.bz2 | ./load.sh .. more
stuff, but nothing that knows anything about XML. ;-)
where load.sh is:
#!/bin/bash
java -server -jar ../tools/mwdumper.jar \
--output=stdout \
--format=xml \
--filter=exactlist:1000_random_titles \
--filter=namespace:0 \
--progress=1000 \
If anyone has any suggestions, I'd be grateful.
Thanks,
Jeremy