Hi,
I'm looking to download the "wiki-latest-stub-meta-history.xml" for smaller languages and perform some analytics on it. I dont really care about the english wikipedia because its too large to handle. I want a csv file made out of this xml so that i can do stats modelling on it.
The trouble is I ve been unable to convert this xml to a csv so far. If i can get this to sql then phpmyadmin can spit out a csv. But mwdumper has failed. I've gotten the following error (copied below)
Thanks in advance, Abhishek
Exception in thread "main" java.lang.NullPointerException at org.mediawiki.importer.XmlDumpReader.readTitle(Unknown Source) at org.mediawiki.importer.XmlDumpReader.endElement(Unknown Source) at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanEndElement(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher .dispatch(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source) at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source) at org.apache.xerces.jaxp.SAXParserImpl.parse(Unknown Source) at javax.xml.parsers.SAXParser.parse(SAXParser.java:198) at org.mediawiki.importer.XmlDumpReader.readDump(Unknown Source) at org.mediawiki.dumper.Dumper.main(Unknown Source)
c:\PROGRA~1\Java\jdk1.6.0_01\bin>dir
Abhishek wrote:
Hi,
I'm looking to download the "wiki-latest-stub-meta-history.xml" for smaller languages and perform some analytics on it. I dont really care about the english wikipedia because its too large to handle. I want a csv file made out of this xml so that i can do stats modelling on it.
The trouble is I ve been unable to convert this xml to a csv so far. If i can get this to sql then phpmyadmin can spit out a csv. But mwdumper has failed. I've gotten the following error (copied below)
Thanks in advance, Abhishek
I don't think the tables sql is appropiate. Maybe -page.sql fits you. What columns do you want on your csv?
wikitech-l@lists.wikimedia.org