[Wikipedia-l] New static HTML dumps available

Tim Starling tstarling at wikimedia.org
Wed Jul 2 02:18:43 UTC 2008

New static HTML dumps of all Wikipedia editions are now available:


Altogether, the dumps are 650GB uncompressed, 40GB compressed.

I think a reasonable next step for this project would be to write filter
scripts that take a compressed dump, reduce the article count in some way,
and then recompress it, possibly in a different format. For instance, we
could have a "most popular 4GB" of the English Wikipedia, based on page
view statistics, recompressed as an SQLite database.

-- Tim Starling

More information about the Wikipedia-l mailing list