Tim Starling wrote:
>Jakob, this ties in with the earlier request for
wikipedia-by-mail. I
>was thinking of doing a
Fundable.org drive for an array so that I
>could serve those requests, but perhaps using the Tool server makes
>more sense...
The current total size of all the pages_full.xml.bz2
files from the
latest dump is 14 GB. In total, the wikipedia directory on the download
server is using 236 GB, thanks mostly to image tarballs and poorly
compressed copies of the text.
Thanks! I thnik you calculated this number with the current, failed en
dump of 1 GB because 20050924_pages_full.xml.bz2 is 11.3 GB so it should
be around 25 GB for all pages_full.xml.bz2 (~5 GB with 7zip) or ~400 GB
decompressed. Seems like thanks to improving compressing algorithms the
Terabyte disk array can wait until end of next year. Spending more
server time on better compression is better than making people spend
more time on downloading.
Greetings,
Jakob
P.S: You could only additionally provide parts of the dumps like
current, articles, fulll, titles ... Maybe articles_full (version
history of articles only) could be of use but I don't know.