Il 08/09/2011 09:22, Ariel T. Glenn ha scritto:
I expect people will need a script to download these
files easily;
didn't someone on this list have a tool in the works?
I wrote this simple bash script
https://github.com/SoNetFBK/wiki-network/blob/master/download_dumps.sh
It's really simple to use.
Usage: download_dumps.sh LANG [OUTPUT_DIR] [MATCHING_STRING]
Examples:
- download_dumps.sh en -> downloads every lastest file from enwiki
- download_dumps.sh en /mydata/dumps -> the same but saves everything in
/mydata/dumps
- download_dumps.sh en /mydata/dumps history -> the same but downloads
only the files that contain the word "history" in the name (you can use
regex too!)
p.s.: in the same repo you'll find other interesting stuff to analyze
the dumps (extracting a social network from user talk pages, content
analysis, ecc..). If you need other info write me ;)
--
f.
"I didn't try, I succeeded"
(Dr. Sheldon Cooper, PhD)
() ascii ribbon campaign - against html e-mail
/\
www.asciiribbon.org - against proprietary attachments
http://about.me/fox91