This is an often asked question on this list...
---------- Forwarded message ---------- From: Tim Starling tstarling@wikimedia.org Date: Tue, Jul 1, 2008 at 10:18 PM Subject: [Wikipedia-l] New static HTML dumps available To: wikipedia-l@lists.wikimedia.org Cc: wikitech-l@lists.wikimedia.org
New static HTML dumps of all Wikipedia editions are now available:
Altogether, the dumps are 650GB uncompressed, 40GB compressed.
I think a reasonable next step for this project would be to write filter scripts that take a compressed dump, reduce the article count in some way, and then recompress it, possibly in a different format. For instance, we could have a "most popular 4GB" of the English Wikipedia, based on page view statistics, recompressed as an SQLite database.
-- Tim Starling
_______________________________________________ Wikipedia-l mailing list Wikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikipedia-l
wiki-research-l@lists.wikimedia.org