Anthony wrote:
Using skip-deltas I think you could make a system fast
enough to run live.
At the very least it could be used as part of an incremental dump system.
Using *smart* skip-deltas, you'd resolve the inefficiencies due to
page-blanking vandalism.
One more possibility is to make md5 of every revision, then diff only
between those that have unique md5s.
One improvement over the diff format used by RCS would
be to use smarter
breakpoints, since wikitext tends to have a lot of really long lines with no
line breaks. Using some simple heuristics to guess at sentence breaks would
probably be useful there. It wouldn't have to be perfect, since
I suggest looking into wdiff (
http://www.gnu.org/software/wdiff/ ).