Neil Harris wrote:
geni wrote:
[snip]
Certain searches of existing content would be
useful the most obvious
being running a copy of the database against a copy of britianica.
And other databases of copyrighted texts, such as InfoTrac
(
http://www.gale.com/onefile/) or similar, and things like Google Book
Search.
-- Neil
Just a thought: the en: Wikipedia gets about 3 edits a second. I wonder
if it would be possible for us to use special pleading through the
Foundation to get a dedicated search pipe into Google that would allow
us to do, say, 30 searches a second 24 hours a day, (which would only be
a tiny, tiny fraction of their overall capacity), in recognition of the
_very_ substantial benefit in advertising revenue they must surely
currently be receiving as a side effect of having Wikipedia's content
online to draw in search queries.
(Think about it: even if only 20% of Wikimedia's 4000 or so page loads a
second come from Google users who are expecting something like Wikipedia
content, and Google only make $0.25 CPM on serving page ads on searches
for those pages, that comes to an income stream of $0.20 per _second_
from Wikipedia searches, or a total of about $8M a year...)
If so, we could integrate the copyright violation bot into the
toolserver, or into the MW server cluster itself.
-- Neil