Anthony wrote:
Using skip-deltas I think you could make a system fast enough to run live. At the very least it could be used as part of an incremental dump system. Using *smart* skip-deltas, you'd resolve the inefficiencies due to page-blanking vandalism.
One more possibility is to make md5 of every revision, then diff only between those that have unique md5s.
One improvement over the diff format used by RCS would be to use smarter breakpoints, since wikitext tends to have a lot of really long lines with no line breaks. Using some simple heuristics to guess at sentence breaks would probably be useful there. It wouldn't have to be perfect, since
I suggest looking into wdiff ( http://www.gnu.org/software/wdiff/ ).