On 26/05/05, David Gerard <fun(a)thingy.apana.org.au> wrote:
Rowan Collins (rowan.collins(a)gmail.com) [050526
09:03]:
Put it together with the thorny question of
"when is a rewrite a
rewrite", and it makes you wish for a meaningful "blame"/"who added
this line" tool - though I tend to agree with the opinion that this
would be an order of magnitude harder for the free text of
encyclopedia articles than it is for source code. Although, it has to
be noted that those IBM researchers managed to get meaningful data in
their "history flow" system...
As I noted, Linus Torvalds' git treats the unit it cares about as the line,
not the file. So blame is carried between filenames. Someone may find this
worth experimenting with for the back end.
Yes, but in free natural language text like a Wikipedia article, there
isn't a meaningful definition of a "line" like with source code -
you've got to either look at a paragraph (probably too big), or maybe
a sentence (which requires somewhat more complex parsing - how many
sentences are there in "e.g. What's the magic no.? 1.23!"). Plus, the
database needed to carry that granularity of blame between articles in
the whole of Wikipedia would surely be humongous - presumably
involving some index of every line of every revision of every article
in the entire encyclopedia, ready for comparison.
But like I say, I don't know how the IBM folks did it, nor even how
"normal" source-analysis tools work, so maybe it is possible after
all.
--
Rowan Collins BSc
[IMSoP]