* Randall Farmer wrote:
As I understand, compressing full-history dumps for
English Wikipedia and
other big wikis takes a lot of resources: enwiki history is about 10TB
unpacked, and 7zip only packs a few MB/s/core. Even with 32 cores, that's
over a day of server time. There's been talk about ways to speed that up in
the past.[1]
That does not sound like much economically. Do keep in mind the cost of
porting, deploying, maintaining, obtaining, and so on, new tools. There
might be hundreds of downstream users and if every one of them has to
spend a couple of minutes adopting to a new format, that can quickly
outweigh any savings, as a simple example.
Technical datadaump aside: *How could I get this more
thoroughly tested,
then maybe added to the dump process, perhaps with an eye to eventually
replacing for 7zip as the alternate, non-bzip2 compressor?* Who do I talk
to to get started? (I'd dealt with Ariel Glenn before, but haven't seen
activity from Ariel lately, and in any case maybe playing with a new tool
falls under Labs or some other heading than dumps devops.) Am I nuts to be
even asking about this? Are there things that would definitely need to
change for integration to be possible? Basically, I'm trying to get this
from a tech demo to something with real-world utility.
I would definitely recommend talking to Igor Pavlov (7-Zip) about this,
he might be interested in having this as part of 7-Zip as some kind of
"fast" option, and also the developers of the `xz` tools. There might
even be ways this could fit within existing extensibility mechanisms of
the formats. Igor Pavlov tends to be quite response through the
SF.net
bug tracker. In any case, they might be able to give directions how this
might become, or not, part of standard tools.
--
Björn Höhrmann · mailto:bjoern@hoehrmann.de ·
http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 ·
http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 ·
http://www.websitedev.de/