On Sat, Jan 10, 2009 at 9:14 AM, Keisial keisial@gmail.com wrote:
bzipping the pages by blocks as I did for my offline reader produces a file size similar to the the original* There may be ways to get similar results without having to rebuild the revisions. Also note that in both cases you still need an intermediate app to provide input dumps for those tools.
*112% measuring enwiki-20081008-pages-meta-current. Looking at ruwiki-20081228-history, both the original bz2 and my faster-access one are 8.2G.
-history dumps and one off page dumps are pretty distinct cases: The history dumps have a lot more available redundancy.
For fast access articles you might want to consider compressing articles one-off with a a dictionary based pre-pass such as http://xwrt.sourceforge.net/