On Wed, Jan 7, 2009 at 9:53 PM, Brion Vibber brion@wikimedia.org wrote:
The current version of my compressor averaged a little better than 250 revisions per second on ruwiki (about 12 hours total) on a 18-month-old desktop. However, as the CPU utilization was only 50-70% of a full processing core most of the time, I suspect that my choice to read and write from an external hard drive may have been the limiting factor. On a good machine, 400+ rev/s might be a plausible number for the current code.
It'd be good to compare this against the general-purpose bzip2 and 7zip LZMA compression...
I started a process to recompress the ruwiki dump using the default settings on 7-Zip. After 5 minutes, it told me I had 16 hours remaining. So I would estimate that my revision compressor is on the same timescale and perhaps somewhat faster than 7-Zip. Again I was reading and writing to an external drive so there could be i/o effect in there as well.
it'd be great if we can host the dev code in source control, under extensions or tools for now, until we can integrate something directly into the export code.
Could someone walk me through how I would do that?
-Robert Rohde