On Tuesday 21 October 2008 08:59:06 Tei wrote:
On Tue, Oct 21, 2008 at 12:33 AM, Nikola Smolenski smolensk@eunet.yu
wrote:
On Saturday 18 October 2008 14:57:59 Daniel Kinzler wrote:
So, what would it take? Where could we try it? what are the concerns?
To measure difference between two edits, I mentioned you that wdiff ( http://www.gnu.org/software/wdiff/ ) could be used: simply count number of changed words in the article. Wdiff could give false positives (an author that merely switches two paragraphs will appear to be a major author), but could not give false negatives (an author who changes a single word really did just change a single word; of course, such a change may be very important, but isn't major, or, IMO, copyrightable).
More sophisticated diffs could also be introduced. For example, it would be relatively simple to make a program that tries to find if an author has switched two (or more) paragraphs, then apply a diff program as if they haven't been switched.
or totally disregard order cat article | sed -e 's/( |\t)/\n/g' | sort
That's an excellent idea! It loses some things, but for measuring size of a change it's simple and it works.