How to use the "Database backup dumps"

List overview All Threads
Download

newer

older

nlwiktionary dump progress on...

Saludos

xiang wang

4 Nov 2010 4 Nov '10

9:08 a.m.

I have download the "Database backup dumps" of chinese Edition. There are files with XML and sql format. I want to have data all in database like MySQL. Can I get this data (especially the XML format) to MySQL database without using MediaWiki? How to do this if possible?

Where can I get the format details of each dump? Because I have read contents in "zhwiki-20101014-pages-articles.xml" , but chiness have two eddition: "Simplified Chinese" and “Traditional Chinese”. Both format exits raffertily In file "zhwiki-20101014-pages-articles.xml" . I don't known how to get rid it.

Thanks!

Attachments:

attachment.htm (text/html — 1.0 KB)

Show replies by date

Platonides

5 Nov 5 Nov

12:42 a.m.

xiang wang wrote:

...

I have download the "Database backup dumps" of chinese Edition. There are files with XML and sql format. I want to have data all in database like MySQL. Can I get this data (especially the XML format) to MySQL database without using MediaWiki? How to do this if possible?

I would use mwdumper to do that http://www.mediawiki.org/wiki/Manual:MWDumper Other options are listed in http://www.mediawiki.org/wiki/Manual:Importing_XML_dumps

...

Where can I get the format details of each dump? Because I have read contents in "zhwiki-20101014-pages-articles.xml" , but chiness have two eddition: "Simplified Chinese" and “Traditional Chinese”. Both format exits raffertily In file "zhwiki-20101014-pages-articles.xml" . I don't known how to get rid it.

Thanks!

That file is in the same format as the wiki pages. The two variants come from the same text (which is what you get in the dump), automatically converted into one or other (with some especifics with text inside -{}- ). That content in a mediawiki install sohuld be able to replicate zhwiki pages.

xiang wang

8:31 a.m.

2010/11/5 Platonides platonides@gmail.com

...

xiang wang wrote:

...
I have download the "Database backup dumps" of chinese Edition. There are files with XML and sql format. I want to have data all in database like MySQL. Can I get this data (especially the XML format) to MySQL database without using MediaWiki? How to do this if possible?

I would use mwdumper to do that http://www.mediawiki.org/wiki/Manual:MWDumper Other options are listed in http://www.mediawiki.org/wiki/Manual:Importing_XML_dumps

...

...
Where can I get the format details of each dump? Because I have read contents in "zhwiki-20101014-pages-articles.xml" , but chiness have two eddition: "Simplified Chinese" and “Traditional Chinese”. Both format exits raffertily In file "zhwiki-20101014-pages-articles.xml" . I don't known how to get rid it.

Thanks!

That file is in the same format as the wiki pages. The two variants come from the same text (which is what you get in the dump), automatically converted into one or other (with some especifics with text inside -{}- ). That content in a mediawiki install sohuld be able to replicate zhwiki pages.

Thanks for your answers! It's very helpful! I used MWDumper, but i get an error:

Exception in thread "main" java.lang.NullPointerException at org.mediawiki.importer.XmlDumpReader.readTitle(XmlDumpReader.java:31 ) at org.mediawiki.importer.XmlDumpReader.endElement(XmlDumpReader.java:2 3) at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Sourc ) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanEndElement Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentConten Dispatcher.dispatch(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(U known Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source) at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Sou ce) at org.apache.xerces.jaxp.SAXParserImpl.parse(Unknown Source) at javax.xml.parsers.SAXParser.parse(SAXParser.java:195) at org.mediawiki.importer.XmlDumpReader.readDump(XmlDumpReader.java:88) at org.mediawiki.dumper.Dumper.main(Dumper.java:143)"

Do you know what's the problem? Thanks!

xiang wang

10:14 a.m.

2010/11/5 Platonides platonides@gmail.com

...

xiang wang wrote:

...
I have download the "Database backup dumps" of chinese Edition. There are files with XML and sql format. I want to have data all in database like MySQL. Can I get this data (especially the XML format) to MySQL database without using MediaWiki? How to do this if possible?

I would use mwdumper to do that http://www.mediawiki.org/wiki/Manual:MWDumper Other options are listed in http://www.mediawiki.org/wiki/Manual:Importing_XML_dumps

...
Where can I get the format details of each dump? Because I have read contents in "zhwiki-20101014-pages-articles.xml" , but chiness have two eddition: "Simplified Chinese" and “Traditional Chinese”. Both format exits raffertily In file "zhwiki-20101014-pages-articles.xml" . I don't known how to get rid it.

Thanks!

That file is in the same format as the wiki pages. The two variants come from the same text (which is what you get in the dump), automatically converted into one or other (with some especifics with text inside -{}- ). That content in a mediawiki install sohuld be able to replicate zhwiki pages.

Thanks for answers! I used WMDumper and have another problems. I used following command(wikidata is my database): *set class=mwdumper.jar;mysql-connector-java-5.1.13/mysql-connector-java-5.1.13/mysql-connector-java-5.1.13-bin.jar set data="F:\Wikipedia Data\zhwiki-20101014-pages-articles.xml.bz2"

java -client -classpath %class% org.mediawiki.dumper.Dumper "-- output=mysql://127.0.0.1/wikidata?user=root&password=wang" "--format=sql:1.5" %data%"*

But I get a problem. *Exception in thread "main" java.io.IOException: java.sql.SQLException: Table 'w kidata.text' doesn't exist at org.mediawiki.importer.XmlDumpReader.readDump(XmlDumpReader.java:92) at org.mediawiki.dumper.Dumper.main(Dumper.java:143)*

"Wikidata" is a database I constructed. How can I construct table "text"? Thanks advanced! David.

5133

Age (days ago)

5134

Last active (days ago)

xmldatadumps-l@lists.wikimedia.org

3 comments

2 participants

tags (0)

participants (2)

Platonides
xiang wang