On Mon, Jan 26, 2009 at 3:47 AM, Petr Kadlec <petr.kadlec(a)gmail.com> wrote:
Is the code
available and I have missed it? Do we have any other
implementation?
I tried to do something similar (two examples are at
http://mormegil.info/wp/blame/AIM.htm
http://mormegil.info/wp/blame/AFC_Ajax.htm); the code is nothing
secret, even though it is not too clean, and there is also no rocket
science [you have been warned:
https://opensvn.csie.org/traccgi/MWTools/browser/MWTools/trunk/PageBlame]:
I also have a blame engine of my own design. It is new and I haven't
released the source.
The biggest problem I see with such tools is that it
is IMHO unusable
for any copyright-related purposes. My tool works by diffing the
article revisions and tracking who was the last author of every word.
Even though you can be much smarter than that, I don't believe you
would be able to track all copyright-relevant contributions with that.
As an example, consider using that tool on an article that was created
by:
1. Importing an article with all its history from the English
Wikipedia to some other-language wiki.
2. Translating it into the local language (for more fun, imagine a
language using a different script, e.g. Russian, or even Chinese)
There is IMHO no way the blame tool could track copyright properly
through the translation (which it has to, copyright-wise). And even in
the general case, I believe such tracking would be an AI-hard task
(often, even a human is unable to do it properly…). Of course, such
Blame tools are great for many reasons (which is why I wrote them),
but I think the current context (license change, attribution etc.)
does not fit them at all.
I think I have a more positive view than you do. Blame engines as a
tool can certainly inform copyright discussions and provide relevant
information, even though I agree they aren't by themselves a complete
solution.
For example, with situations where one is trying to list a fixed
number of "major authors" (as provided in the GFDL, for example),
blaming tools can make a reasonable guess at which authors are
relevant. They also help estimate the answer to important meta
questions, such as "How many authors does a typical Wikipedia article
really have?"
When the license calls for attribution to be treated in a "reasonable"
way, I suspect that one could make a good case that relying on a good
blame engine would often generate a reasonable attempt at attribution,
even though there are cases (like translation) where they will fail.
Attribution generated by blaming can be a good starting point, though
it may not necessarily be the final answer.
-Robert Rohde