Il giorno 13/ago/08, alle ore 16:14, zh509(a)york.ac.uk ha scritto:
Hi, all,
I have downloaded enwiki-latest-stub-meta-history.xml.gz to import
in my
own machine. but this file if too large to run any query in my
computer.
because i used Oracle can't accept file with more than 50GB.
Is that possible anyone can help me to narrow this dump? Only i need
is
textid, revision id, username, user id, length of content and
timestamp. I
am not good at technical tool. so thanks a million for your help!
zeyi
_______________________________________________
Toolserver-l mailing list
Toolserver-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
You can use xml2sql [1] to split the file into three smaller files:
page.sql, revision.sql and text.sql. Alternatively you can use
MWDumper [2] to import the dump.
After you've imported the dump into the database you can delete
unnecessary columns with Oracle (I don't know how to do this but
probably you can).
*[1]
http://meta.wikimedia.org/wiki/Xml2sql
*[2]
http://www.mediawiki.org/wiki/MWDumper
Regards
Pietrodn
powerpdn(a)gmail.com