On 7/26/06, Domas Mituzas midom.lists@gmail.com wrote: [snip]
With all revision pages its around 3 TB total.
That really requires advanced tech. At Wikipedia revision pages are compressed, and a proper compression run contracts whole dataset into 0.5T or so (or less).
A minor nit.. with braindead stupid compression (toasted columns in PGsql which use a modified LZ algo which gets less compression than gzip -3 but is much faster and compresses a single row at a time) you can get the whole of english wikipedia into 0.4TB including the needed indexes and the (not insubstantial) DB overhead.
With state of the art compression (lzma) you can get all the revisions into 6gb, but you lose random access. At wikimania tech days I'll be presenting a system which achieves similar compression perform ace but preserves random access... Which is at least a mildly interesting subject, although perhaps without practical implications for wikimedia until the disk/cpu performance gap widens a bit further. :)