Il giorno 13/ago/08, alle ore 16:14, zh509@york.ac.uk ha scritto:
Hi, all,
I have downloaded enwiki-latest-stub-meta-history.xml.gz to import in my own machine. but this file if too large to run any query in my computer. because i used Oracle can't accept file with more than 50GB.
Is that possible anyone can help me to narrow this dump? Only i need is textid, revision id, username, user id, length of content and timestamp. I am not good at technical tool. so thanks a million for your help!
zeyi
Toolserver-l mailing list Toolserver-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/toolserver-l
You can use xml2sql [1] to split the file into three smaller files: page.sql, revision.sql and text.sql. Alternatively you can use MWDumper [2] to import the dump. After you've imported the dump into the database you can delete unnecessary columns with Oracle (I don't know how to do this but probably you can).
*[1] http://meta.wikimedia.org/wiki/Xml2sql *[2] http://www.mediawiki.org/wiki/MWDumper
Regards
Pietrodn powerpdn@gmail.com