On Thu, Oct 18, 2012 at 6:26 AM, James Heilman <jmh649(a)gmail.com> wrote:
We really need a plagiarism detection tool so that we
can make sure our
sources are not simply "copy and pastes" of older versions of Wikipedia.
Today I was happily improving our article on pneumonia as I have a day off.
I came across a recommendation that baby's should be suction at birth to
decrease their risk of pneumonia with a {{cn}} tag. So I went to Google
books and up came a book that supported it perfectly. And than I noticed
that this book supported the previous and next few sentences as well. It
also supported a number of other sections we had in the article but was
missing our references. The book was selling for $340 a copy. Our articles
have improved a great deal since 2007 and yet school are buying copy edited
version of Wikipedia from 5 years ago. The bit about suctioning babies at
birth is was wrong and I have corrected it. I think we need to get this
news out. Support Wikipedia and use the latest version online!
It's sort of unrelated, but there's a project called Common Crawl:
http://commoncrawl.org/
It is trying to produce an "open crawl of the web" (much as Google,
Bing etc. have for their search engines).
Now that the copyvio bot is down, I'm wondering if someone would be
interested in building something that used the Common Crawl database,
or whether that'd be practical.
--
Tom Morris
<http://tommorris.org/>