[Xmldatadumps-l] How to use the "Database backup dumps"

xiang wang xiangwangcn at gmail.com
Fri Nov 5 06:31:23 UTC 2010


2010/11/5 Platonides <platonides at gmail.com>

> xiang wang wrote:
> > I have download the "Database backup dumps" of chinese Edition.
> > There are files with XML and sql format. I want to have data all in
> > database like MySQL. Can I get this data (especially the XML format) to
> > MySQL database without using MediaWiki? How to do this if possible?
>
> I would use mwdumper to do that
> http://www.mediawiki.org/wiki/Manual:MWDumper
> Other options are listed in
> http://www.mediawiki.org/wiki/Manual:Importing_XML_dumps
>

>
> > Where can I get the format details of each dump? Because I have read
> > contents in "zhwiki-20101014-pages-articles.xml" , but chiness have two
> > eddition: "Simplified Chinese" and “Traditional Chinese”. Both format
> > exits raffertily In file "zhwiki-20101014-pages-articles.xml" . I don't
> > known how to get rid it.
> >
> > Thanks!
>
> That file is in the same format as the wiki pages. The two variants come
> from the same text (which is what you get in the dump), automatically
> converted into one or other (with some especifics with text inside -{}- ).
> That content in a mediawiki install sohuld be able to replicate zhwiki
> pages.
>

Thanks for your answers! It's very helpful! I used MWDumper, but i get an
error:

Exception in thread "main" java.lang.NullPointerException
        at
org.mediawiki.importer.XmlDumpReader.readTitle(XmlDumpReader.java:31
)
        at
org.mediawiki.importer.XmlDumpReader.endElement(XmlDumpReader.java:2
3)
        at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown
Sourc
)
        at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanEndElement
Unknown Source)
        at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentConten
Dispatcher.dispatch(Unknown Source)
        at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(U
known Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
Source)
        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
Source)
        at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
        at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown
Sou
ce)
        at org.apache.xerces.jaxp.SAXParserImpl.parse(Unknown Source)
        at javax.xml.parsers.SAXParser.parse(SAXParser.java:195)
        at
org.mediawiki.importer.XmlDumpReader.readDump(XmlDumpReader.java:88)
        at org.mediawiki.dumper.Dumper.main(Dumper.java:143)"

Do you know what's the problem?
Thanks!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.wikimedia.org/pipermail/xmldatadumps-l/attachments/20101105/6bad7f0c/attachment-0001.htm 


More information about the Xmldatadumps-l mailing list