[WikiEN-l] Copyright Violation Bot

Fri Dec 22 01:37:26 UTC 2006

Do a search in Google; invariably the first reference you are given is
Wikipedia. I don't have a clue what you are talking about technically, but I
believe any agreement between Wikipedia and Google to improve service would
be mutually beneficial.

Marc

> From: "Gregory Maxwell" <gmaxwell at gmail.com>
> Reply-To: English Wikipedia <wikien-l at Wikipedia.org>
> Date: Thu, 21 Dec 2006 20:23:32 -0500
> To: "English Wikipedia" <wikien-l at wikipedia.org>
> Subject: Re: [WikiEN-l] Copyright Violation Bot
> 
> On 12/21/06, Neil Harris <usenet at tonal.clara.co.uk> wrote:
>> Just a thought: the en: Wikipedia gets about 3 edits a second. I wonder
>> if it would be possible for us to use special pleading through the
>> Foundation to get a dedicated search pipe into Google that would allow
>> us to do, say, 30 searches a second 24 hours a day, (which would only be
>> a tiny, tiny fraction of their overall capacity), in recognition of the
>> _very_ substantial benefit in advertising revenue they must surely
>> currently be receiving as a side effect of having Wikipedia's content
>> online to draw in search queries.
>> 
>> (Think about it: even if only 20% of Wikimedia's 4000 or so page loads a
>> second come from Google users who are expecting something like Wikipedia
>> content, and Google only make $0.25 CPM on serving page ads on searches
>> for those pages, that comes to an income stream of $0.20 per _second_
>> from Wikipedia searches, or a total of about $8M a year...)
>> 
>> If so, we could integrate the copyright violation bot into the
>> toolserver, or into the MW server cluster itself.
> 
> 
> Go ahead: Write the software, make it good, make it scale, make it
> robust so that you don't have to constantly twiddle with it to keep it
> working.
> 
> I have no doubt that Google's ratelimit can be worked out.  I promise
> you that good work done towards these ends will not be work wasted.
> Make sure that it's sufficently modular that we'll be able to use it
> to generate queries against other texts sources.
> 
> The logic for software to do this well is not trivial but certainly
> not impossible. Working out the right access with Google is also not
> impossible. Someone just needs to step up an do it.
> _______________________________________________
> WikiEN-l mailing list
> WikiEN-l at Wikipedia.org
> To unsubscribe from this mailing list, visit:
> http://mail.wikipedia.org/mailman/listinfo/wikien-l