[WikiEN-l] Copyright Violation Bot

Fri Dec 22 01:09:42 UTC 2006

Neil Harris wrote:
> geni wrote:
>
> [snip]
>   
>> Certain searches of existing content would be useful the most obvious 
>> being running a copy of the database against a copy of britianica.
>>
>>     
>
> And other databases of copyrighted texts, such as InfoTrac 
> (http://www.gale.com/onefile/) or similar, and things like Google Book 
> Search.
>
> -- Neil
>   

Just a thought: the en: Wikipedia gets about 3 edits a second. I wonder 
if it would be possible for us to use special pleading through the 
Foundation to get a dedicated search pipe into Google that would allow 
us to do, say, 30 searches a second 24 hours a day, (which would only be 
a tiny, tiny fraction of their overall capacity), in recognition of the 
_very_ substantial benefit in advertising revenue they must surely 
currently be receiving as a side effect of having Wikipedia's content 
online to draw in search queries.

(Think about it: even if only 20% of Wikimedia's 4000 or so page loads a 
second come from Google users who are expecting something like Wikipedia 
content, and Google only make $0.25 CPM on serving page ads on searches 
for those pages, that comes to an income stream of $0.20 per _second_ 
from Wikipedia searches, or a total of about $8M a year...)

If so, we could integrate the copyright violation bot into the 
toolserver, or into the MW server cluster itself.

-- Neil