Tim Starling wrote:
Jakob, this ties in with the earlier request for wikipedia-by-mail. I was thinking of doing a Fundable.org drive for an array so that I could serve those requests, but perhaps using the Tool server makes more sense...
The current total size of all the pages_full.xml.bz2 files from the latest dump is 14 GB. In total, the wikipedia directory on the download server is using 236 GB, thanks mostly to image tarballs and poorly compressed copies of the text.
Thanks! I thnik you calculated this number with the current, failed en dump of 1 GB because 20050924_pages_full.xml.bz2 is 11.3 GB so it should be around 25 GB for all pages_full.xml.bz2 (~5 GB with 7zip) or ~400 GB decompressed. Seems like thanks to improving compressing algorithms the Terabyte disk array can wait until end of next year. Spending more server time on better compression is better than making people spend more time on downloading.
Greetings, Jakob
P.S: You could only additionally provide parts of the dumps like current, articles, fulll, titles ... Maybe articles_full (version history of articles only) could be of use but I don't know.