On Wed, Jan 7, 2009 at 9:53 PM, Brion Vibber <brion(a)wikimedia.org> wrote:
version of my compressor averaged a little better than 250
revisions per second on ruwiki (about 12 hours total) on a
18-month-old desktop. However, as the CPU utilization was only 50-70%
of a full processing core most of the time, I suspect that my choice
to read and write from an external hard drive may have been the
limiting factor. On a good machine, 400+ rev/s might be a plausible
number for the current code.
It'd be good to compare this against the general-purpose bzip2 and 7zip
I started a process to recompress the ruwiki dump using the default
settings on 7-Zip. After 5 minutes, it told me I had 16 hours
remaining. So I would estimate that my revision compressor is on the
same timescale and perhaps somewhat faster than 7-Zip. Again I was
reading and writing to an external drive so there could be i/o effect
in there as well.
it'd be great if we
can host the dev code in source control, under extensions or tools for
now, until we can integrate something directly into the export code.
Could someone walk me through how I would do that?