On Mon, Jan 26, 2009 at 3:47 AM, Petr Kadlec petr.kadlec@gmail.com wrote:
Is the code available and I have missed it? Do we have any other implementation?
I tried to do something similar (two examples are at http://mormegil.info/wp/blame/AIM.htm http://mormegil.info/wp/blame/AFC_Ajax.htm); the code is nothing secret, even though it is not too clean, and there is also no rocket science [you have been warned: https://opensvn.csie.org/traccgi/MWTools/browser/MWTools/trunk/PageBlame]:
I also have a blame engine of my own design. It is new and I haven't released the source.
The biggest problem I see with such tools is that it is IMHO unusable for any copyright-related purposes. My tool works by diffing the article revisions and tracking who was the last author of every word. Even though you can be much smarter than that, I don't believe you would be able to track all copyright-relevant contributions with that. As an example, consider using that tool on an article that was created by:
- Importing an article with all its history from the English
Wikipedia to some other-language wiki. 2. Translating it into the local language (for more fun, imagine a language using a different script, e.g. Russian, or even Chinese)
There is IMHO no way the blame tool could track copyright properly through the translation (which it has to, copyright-wise). And even in the general case, I believe such tracking would be an AI-hard task (often, even a human is unable to do it properly…). Of course, such Blame tools are great for many reasons (which is why I wrote them), but I think the current context (license change, attribution etc.) does not fit them at all.
I think I have a more positive view than you do. Blame engines as a tool can certainly inform copyright discussions and provide relevant information, even though I agree they aren't by themselves a complete solution.
For example, with situations where one is trying to list a fixed number of "major authors" (as provided in the GFDL, for example), blaming tools can make a reasonable guess at which authors are relevant. They also help estimate the answer to important meta questions, such as "How many authors does a typical Wikipedia article really have?"
When the license calls for attribution to be treated in a "reasonable" way, I suspect that one could make a good case that relying on a good blame engine would often generate a reasonable attempt at attribution, even though there are cases (like translation) where they will fail. Attribution generated by blaming can be a good starting point, though it may not necessarily be the final answer.
-Robert Rohde