I have download the "Database backup dumps" of chinese Edition. There are files with XML and sql format. I want to have data all in database like MySQL. Can I get this data (especially the XML format) to MySQL database without using MediaWiki? How to do this if possible?
Where can I get the format details of each dump? Because I have read contents in "zhwiki-20101014-pages-articles.xml" , but chiness have two eddition: "Simplified Chinese" and “Traditional Chinese”. Both format exits raffertily In file "zhwiki-20101014-pages-articles.xml" . I don't known how to get rid it.
Thanks!
xiang wang wrote:
I have download the "Database backup dumps" of chinese Edition. There are files with XML and sql format. I want to have data all in database like MySQL. Can I get this data (especially the XML format) to MySQL database without using MediaWiki? How to do this if possible?
I would use mwdumper to do that http://www.mediawiki.org/wiki/Manual:MWDumper Other options are listed in http://www.mediawiki.org/wiki/Manual:Importing_XML_dumps
Where can I get the format details of each dump? Because I have read contents in "zhwiki-20101014-pages-articles.xml" , but chiness have two eddition: "Simplified Chinese" and “Traditional Chinese”. Both format exits raffertily In file "zhwiki-20101014-pages-articles.xml" . I don't known how to get rid it.
Thanks!
That file is in the same format as the wiki pages. The two variants come from the same text (which is what you get in the dump), automatically converted into one or other (with some especifics with text inside -{}- ). That content in a mediawiki install sohuld be able to replicate zhwiki pages.
2010/11/5 Platonides platonides@gmail.com
xiang wang wrote:
I have download the "Database backup dumps" of chinese Edition. There are files with XML and sql format. I want to have data all in database like MySQL. Can I get this data (especially the XML format) to MySQL database without using MediaWiki? How to do this if possible?
I would use mwdumper to do that http://www.mediawiki.org/wiki/Manual:MWDumper Other options are listed in http://www.mediawiki.org/wiki/Manual:Importing_XML_dumps
Where can I get the format details of each dump? Because I have read contents in "zhwiki-20101014-pages-articles.xml" , but chiness have two eddition: "Simplified Chinese" and “Traditional Chinese”. Both format exits raffertily In file "zhwiki-20101014-pages-articles.xml" . I don't known how to get rid it.
Thanks!
That file is in the same format as the wiki pages. The two variants come from the same text (which is what you get in the dump), automatically converted into one or other (with some especifics with text inside -{}- ). That content in a mediawiki install sohuld be able to replicate zhwiki pages.
Thanks for your answers! It's very helpful! I used MWDumper, but i get an error:
Exception in thread "main" java.lang.NullPointerException at org.mediawiki.importer.XmlDumpReader.readTitle(XmlDumpReader.java:31 ) at org.mediawiki.importer.XmlDumpReader.endElement(XmlDumpReader.java:2 3) at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Sourc ) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanEndElement Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentConten Dispatcher.dispatch(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(U known Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source) at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Sou ce) at org.apache.xerces.jaxp.SAXParserImpl.parse(Unknown Source) at javax.xml.parsers.SAXParser.parse(SAXParser.java:195) at org.mediawiki.importer.XmlDumpReader.readDump(XmlDumpReader.java:88) at org.mediawiki.dumper.Dumper.main(Dumper.java:143)"
Do you know what's the problem? Thanks!
2010/11/5 Platonides platonides@gmail.com
xiang wang wrote:
I have download the "Database backup dumps" of chinese Edition. There are files with XML and sql format. I want to have data all in database like MySQL. Can I get this data (especially the XML format) to MySQL database without using MediaWiki? How to do this if possible?
I would use mwdumper to do that http://www.mediawiki.org/wiki/Manual:MWDumper Other options are listed in http://www.mediawiki.org/wiki/Manual:Importing_XML_dumps
Where can I get the format details of each dump? Because I have read contents in "zhwiki-20101014-pages-articles.xml" , but chiness have two eddition: "Simplified Chinese" and “Traditional Chinese”. Both format exits raffertily In file "zhwiki-20101014-pages-articles.xml" . I don't known how to get rid it.
Thanks!
That file is in the same format as the wiki pages. The two variants come from the same text (which is what you get in the dump), automatically converted into one or other (with some especifics with text inside -{}- ). That content in a mediawiki install sohuld be able to replicate zhwiki pages.
Thanks for answers! I used WMDumper and have another problems. I used following command(wikidata is my database): *set class=mwdumper.jar;mysql-connector-java-5.1.13/mysql-connector-java-5.1.13/mysql-connector-java-5.1.13-bin.jar set data="F:\Wikipedia Data\zhwiki-20101014-pages-articles.xml.bz2"
java -client -classpath %class% org.mediawiki.dumper.Dumper "-- output=mysql://127.0.0.1/wikidata?user=root&password=wang" "--format=sql:1.5" %data%"*
But I get a problem. *Exception in thread "main" java.io.IOException: java.sql.SQLException: Table 'w kidata.text' doesn't exist at org.mediawiki.importer.XmlDumpReader.readDump(XmlDumpReader.java:92) at org.mediawiki.dumper.Dumper.main(Dumper.java:143)*
"Wikidata" is a database I constructed. How can I construct table "text"? Thanks advanced! David.
xmldatadumps-l@lists.wikimedia.org