I tried using WMDumper to load the content of wikipedia in a Mysql 5 Database. I used tables.sql to generate the table. I then tried writing the data in the mySql using WMDumper and get the following results.
C:\Downloads>set class=mwdumper.jar;mysql-connector-java-3.0.11-stable-bin.jar
C:\Downloads>set data="C:\Downloads\enwiki-20070206-pages-articles.xml.bz2"
C:\Downloads>java -client -classpath mwdumper.jar;mysql-connector-java-3.0.11-stable-bin.jar org.mediawiki.dumper.Dumper "--output=mysql://127.0.0.1/enwiki?user=xxxx&password=xxxxxxx" "--format=sql:1.5" "C:\Downloads\enwiki-20070206-pages-a rticles.xml.bz2" 1.000 pages (148,148/sec), 1.000 revs (148,148/sec) 2.000 pages (156,104/sec), 2.000 revs (156,104/sec) Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at java.lang.String.substring(Unknown Source) at com.mysql.jdbc.EscapeProcessor.escapeSQL(EscapeProcessor.java:151) at com.mysql.jdbc.Statement.execute(Statement.java:845) at org.mediawiki.importer.SqlServerStream.writeStatement(Unknown Source) at org.mediawiki.importer.SqlWriter.flushInsertBuffer(Unknown Source) at org.mediawiki.importer.SqlWriter.bufferInsertRow(Unknown Source) at org.mediawiki.importer.SqlWriter15.writeRevision(Unknown Source) at org.mediawiki.importer.MultiWriter.writeRevision(Unknown Source) at org.mediawiki.importer.PageFilter.writeRevision(Unknown Source) at org.mediawiki.dumper.ProgressFilter.writeRevision(Unknown Source) at org.mediawiki.importer.XmlDumpReader.closeRevision(Unknown Source) at org.mediawiki.importer.XmlDumpReader.endElement(Unknown Source) at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanEndElement(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source) at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source) at javax.xml.parsers.SAXParser.parse(Unknown Source) at javax.xml.parsers.SAXParser.parse(Unknown Source) at org.mediawiki.importer.XmlDumpReader.readDump(Unknown Source) at org.mediawiki.dumper.Dumper.main(Unknown Source)
Axel Ngonga wrote:
I tried using WMDumper to load the content of wikipedia in a Mysql 5 Database. I used tables.sql to generate the table. I then tried writing the data in the mySql using WMDumper and get the following results.
You'll probably get the answer that the MediaWiki developers don't support third-party extensions, so you'd better give it a try with the maintenance/importDump.php script first (run it from the command-line).
Good luck,
Boris
Boris Eetgerink write:
You'll probably get the answer that the MediaWiki developers don't support third-party extensions, so you'd better give it a try with the maintenance/importDump.php script first (run it from the command-line).
Good luck,
Boris
MWDumper is supported by MediaWiki. I don't know the author, but it's certainly not a "third party extension", but the recommended tool. importDump.php is too slow for using it with a full wiki dump (it renders each page).
http://www.mediawiki.org/wiki/MWDumper http://download.wikimedia.org/tools/
Platonides wrote:
Boris Eetgerink write:
You'll probably get the answer that the MediaWiki developers don't support third-party extensions, so you'd better give it a try with the maintenance/importDump.php script first (run it from the command-line).
Good luck,
Boris
MWDumper is supported by MediaWiki. I don't know the author, but it's certainly not a "third party extension", but the recommended tool. importDump.php is too slow for using it with a full wiki dump (it renders each page).
My apologies then, and thanks for the information.
Boris
On 14/02/07, Platonides Platonides@gmail.com wrote:
MWDumper is supported by MediaWiki. I don't know the author, but it's certainly not a "third party extension", but the recommended tool. importDump.php is too slow for using it with a full wiki dump (it renders each page).
Brion Vibber, I thought?
Rob Church
| -----Original Message----- | From: wikitech-l-bounces@lists.wikimedia.org | [mailto:wikitech-l-bounces@lists.wikimedia.org] On Behalf Of | Rob Church | Sent: Wednesday, February 14, 2007 6:38 PM / | On 14/02/07, Platonides Platonides@gmail.com wrote: | > MWDumper is supported by MediaWiki. I don't know the author, ? / | Brion Vibber, I thought?
Without doubt! :-))
Reg., Janusz 'Ency' Dorozynski
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Axel Ngonga wrote:
C:\Downloads>set class=mwdumper.jar;mysql-connector-java-3.0.11-stable-bin.jar
[snip]
1.000 pages (148,148/sec), 1.000 revs (148,148/sec) 2.000 pages (156,104/sec), 2.000 revs (156,104/sec) Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at java.lang.String.substring(Unknown Source) at com.mysql.jdbc.EscapeProcessor.escapeSQL(EscapeProcessor.java:151) at com.mysql.jdbc.Statement.execute(Statement.java:845)
I can't reproduce this problem with MySQL Connector/J 3.0.14 or 3.1.11 (java 1.5.0_06-113 on Mac OS X 10.4/Intel, tried current mwdumper build and the snapshot on download.wikimedia.org); try grabbing 3.0.14 from http://dev.mysql.com/.
- -- brion vibber (brion @ pobox.com / brion @ wikimedia.org)
wikitech-l@lists.wikimedia.org