Hi, all,
I have downloaded enwiki-latest-stub-meta-history.xml.gz to import in my own machine. but this file if too large to run any query in my computer. because i used Oracle can't accept file with more than 50GB.
Is that possible anyone can help me to narrow this dump? Only i need is textid, revision id, username, user id, length of content and timestamp. I am not good at technical tool. so thanks a million for your help!
zeyi
zeyi wrote:
Hi, all,
I have downloaded enwiki-latest-stub-meta-history.xml.gz to import in my own machine. but this file if too large to run any query in my computer. because i used Oracle can't accept file with more than 50GB.
You mean that Oracle can't have tables larger than 50GB?? Or are you trying to import enwiki-latest-stub-meta-history.xml.gz? Take into account that it is not a sql file. You need to process it with some tool, like mwdumper. Which one are you using? The easiest way to reduce the imported data is probably modifying that tool.
Il giorno 13/ago/08, alle ore 16:14, zh509@york.ac.uk ha scritto:
Hi, all,
I have downloaded enwiki-latest-stub-meta-history.xml.gz to import in my own machine. but this file if too large to run any query in my computer. because i used Oracle can't accept file with more than 50GB.
Is that possible anyone can help me to narrow this dump? Only i need is textid, revision id, username, user id, length of content and timestamp. I am not good at technical tool. so thanks a million for your help!
zeyi
Toolserver-l mailing list Toolserver-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/toolserver-l
You can use xml2sql [1] to split the file into three smaller files: page.sql, revision.sql and text.sql. Alternatively you can use MWDumper [2] to import the dump. After you've imported the dump into the database you can delete unnecessary columns with Oracle (I don't know how to do this but probably you can).
*[1] http://meta.wikimedia.org/wiki/Xml2sql *[2] http://www.mediawiki.org/wiki/MWDumper
Regards
Pietrodn powerpdn@gmail.com
toolserver-l@lists.wikimedia.org