Re: [WikiEN-l] Copyright Violation Bot

22 Dec 2006


      On 12/21/06, Neil Harris usenet@tonal.clara.co.uk wrote:
...
Just a thought: the en: Wikipedia gets about 3 edits a second. I wonder
if it would be possible for us to use special pleading through the
Foundation to get a dedicated search pipe into Google that would allow
us to do, say, 30 searches a second 24 hours a day, (which would only be
a tiny, tiny fraction of their overall capacity), in recognition of the
_very_ substantial benefit in advertising revenue they must surely
currently be receiving as a side effect of having Wikipedia's content
online to draw in search queries.
(Think about it: even if only 20% of Wikimedia's 4000 or so page loads a
second come from Google users who are expecting something like Wikipedia
content, and Google only make $0.25 CPM on serving page ads on searches
for those pages, that comes to an income stream of $0.20 per _second_
from Wikipedia searches, or a total of about $8M a year...)
If so, we could integrate the copyright violation bot into the
toolserver, or into the MW server cluster itself.
Go ahead: Write the software, make it good, make it scale, make it
robust so that you don't have to constantly twiddle with it to keep it
working.
I have no doubt that Google's ratelimit can be worked out.  I promise
you that good work done towards these ends will not be work wasted.
Make sure that it's sufficently modular that we'll be able to use it
to generate queries against other texts sources.
The logic for software to do this well is not trivial but certainly
not impossible. Working out the right access with Google is also not
impossible. Someone just needs to step up an do it.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

Re: [WikiEN-l] Copyright Violation Bot