On Sat, Jan 10, 2009 at 9:14 AM, Keisial <keisial(a)gmail.com> wrote:
bzipping the pages by blocks as I did for my offline
reader produces a
file size similar to the the original*
There may be ways to get similar results without having to rebuild the
Also note that in both cases you still need an intermediate app to
provide input dumps for those tools.
*112% measuring enwiki-20081008-pages-meta-current. Looking at
ruwiki-20081228-history, both the original bz2 and my faster-access one
-history dumps and one off page dumps are pretty distinct cases: The
history dumps have a lot more available redundancy.
For fast access articles you might want to consider compressing
articles one-off with a a dictionary based pre-pass such as