Neil Harris wrote:
geni wrote:
[snip]
Certain searches of existing content would be useful the most obvious being running a copy of the database against a copy of britianica.
And other databases of copyrighted texts, such as InfoTrac (http://www.gale.com/onefile/) or similar, and things like Google Book Search.
-- Neil
Just a thought: the en: Wikipedia gets about 3 edits a second. I wonder if it would be possible for us to use special pleading through the Foundation to get a dedicated search pipe into Google that would allow us to do, say, 30 searches a second 24 hours a day, (which would only be a tiny, tiny fraction of their overall capacity), in recognition of the _very_ substantial benefit in advertising revenue they must surely currently be receiving as a side effect of having Wikipedia's content online to draw in search queries.
(Think about it: even if only 20% of Wikimedia's 4000 or so page loads a second come from Google users who are expecting something like Wikipedia content, and Google only make $0.25 CPM on serving page ads on searches for those pages, that comes to an income stream of $0.20 per _second_ from Wikipedia searches, or a total of about $8M a year...)
If so, we could integrate the copyright violation bot into the toolserver, or into the MW server cluster itself.
-- Neil