On 25/01/2008, Shmuel Weidberg <ezrawax(a)gmail.com> wrote:
Thinking about what would be involved, it seems that it would be very
difficult to sift through hundreds of edits to determine who wrote
what. Anybody have any ideas about how to make it a manageable task?
It's not difficult, but because of vandalisation blanking issues in
principle you need to go through the entire history of the article to find
out the earliest time that a particular edit/paragraph/sentence/string of
characters appeared, which you could probably best check using hashing. It
would be time consuming, it's an O(N*M) problem per article where N is the
number of history items and M is the size of the final article.
But you could do it a different way where you go back through the history in
large jumps (binary search if you know what that is) until a particular
contribution disappeared, and then sniff around checking to see that the
disappearance at that point in the history wasn't just a temporary blanking.
That would be O(log(N)*M) or better.
There would still be issues though, if somebody correct spelling, then it
would look like they wrote that bit entirely to the dumb program, whereas a
human would probably still credit the original guy mostly, but there might
be ways around that too by checking the percentage change or something.
It seems doable.
Regards,
Ezra
--
-Ian Woollard
We live in an imperfectly imperfect world. If we lived in a perfectly
imperfect world things would be a lot better.