[Foundation-l] Wikipedia meets git
John Vandenberg
jayvdb at gmail.com
Sat Oct 17 15:04:12 UTC 2009
On Sun, Oct 18, 2009 at 1:05 AM, Anthony <wikimail at inbox.org> wrote:
> On Sat, Oct 17, 2009 at 4:40 AM, jamesmikedupont at googlemail.com
> <jamesmikedupont at googlemail.com> wrote:
>>> It would be nice if the git archival format was more efficient for the
>>> kinds of changes made in Wikipedia articles: Source code changes tends
>>> to have short lines and changes tend to change a significant portion
>>> of the lines, while edits on Wikipedia are far more likely to change
>>> only part of a very long line (really, a paragraph).... so working
>>> with line level deltas is efficient for source code while inefficient
>>> for Wikipedia data.
>>
>> I have started to work on the blame code
>> to bring it down to the char level and learn about it.
>
> Char level would probably make it too inefficient to merge deltas.
> Treating a period followed by a space as a line separator would
> probably be more efficient.
>
> The key to efficiency is to use skip deltas, though. You build a
> binary tree so accessing any revision requires the application of only
> log(n) deltas.
>
> I asked whether or not you tried svn, because svn already uses skip deltas.
svn would be daft, for so many reasons.
> Is the idea that the entire file would need to be transferred over the
> Internet, though? If so, I guess you wouldn't want to use skip deltas
> - they greatly increase access time to early revisions, but at a
> slight space penalty.
With git, parts of the checkout can be shallow clones.
--
John Vandenberg
More information about the foundation-l
mailing list