[Foundation-l] Re: A license for the Ultimate Wiktionary

Rowan Collins rowan.collins at gmail.com
Thu May 26 16:43:38 UTC 2005


On 26/05/05, David Gerard <fun at thingy.apana.org.au> wrote:
> Rowan Collins (rowan.collins at gmail.com) [050526 09:03]:
> > Put it together with the thorny question of "when is a rewrite a
> > rewrite", and it makes you wish for a meaningful "blame"/"who added
> > this line" tool - though I tend to agree with the opinion that this
> > would be an order of magnitude harder for the free text of
> > encyclopedia articles than it is for source code. Although, it has to
> > be noted that those IBM researchers managed to get meaningful data in
> > their "history flow" system...
> 
> As I noted, Linus Torvalds' git treats the unit it cares about as the line,
> not the file. So blame is carried between filenames. Someone may find this
> worth experimenting with for the back end.

Yes, but in free natural language text like a Wikipedia article, there
isn't a meaningful definition of a "line" like with source code -
you've got to either look at a paragraph (probably too big), or maybe
a sentence (which requires somewhat more complex parsing - how many
sentences are there in "e.g. What's the magic no.? 1.23!"). Plus, the
database needed to carry that granularity of blame between articles in
the whole of Wikipedia would surely be humongous - presumably
involving some index of every line of every revision of every article
in the entire encyclopedia, ready for comparison.

But like I say, I don't know how the IBM folks did it, nor even how
"normal" source-analysis tools work, so maybe it is possible after
all.

-- 
Rowan Collins BSc
[IMSoP]



More information about the foundation-l mailing list