> How about using another algorithm that does this already?
Thanks for those who answered my mail. The thing is that I really
really want my program to work natively with the dump files from
Wikipedia, both because it would take me days to recompress 1,5 GB
files, I would have no capacity to host them, etc. I am not just
making this program to generate one offline CD for one language, but
for it to be a tool that works with all dump files (right now I have
the dump files of 9 smaller languages in a catalogue, all the
interwiki links work etc - just slow)...
Therefore, if the people running the dumps would consider changing the
format to something that was easier to random access, I would be all
for it. Indeed I don't know the pros and cons that made them choose
7zip in the first place. However, I don't even know where to start a
discussion with them, and I am imagining that such a decision would
take very long to implement. Thus I figured trying to tweak 7zip would
probably be a much faster way. :)
If you have any pointers on getting in touch with the dump people (I
also tried pointing out on a talk page somewhere a week ago that on
the static download page it says that dumps for December are currently
in progress, and the link points to November, however, all the dumps
for December are done according to the log, and you can download them
if you type in the URL manually... that was a week ago and apparently
that talk page wasn't a good way of getting in touch with them), or if
some of them are hanging around here, I'd love to have a discussion
with them, both about the dump format itself, and some other technical
details on how they prepare the material.
(I would also love to have several HTML dumps - one with only the
article pages. Currently there is only one - which includes all pages,
even the image detail pages). Let me say though, that they've done an
awesome job, and there are some really neat decisions in how to make
the static dumps.
Thanks a lot
Stian