I remember hearing a couple of times that CorenSearchBot was down, but just
assumed that something so important was being rescued, though I did wonder
slightly about the recent net increase in articles on EN wiki. 3,738,826
articles today means we've way overshot the 3 million projection, the 3.5
million prediction is looking distinctly cautious and and even the 4 million
by late 2012
http://commons.wikimedia.org/wiki/File:Enwikipediapercgrowth.PNG looks
somewhat unceiling like.
Could we get Google and Bing to make an exception for CorenSearchbot? If not
then I'd agree that a spider would make sense, though I've no idea what that
would cost. Having our own spider could be useful for other things though,
including:
# bot adding of {{deadlink}} templates.
# creating our own wayback machine showing webpages as they were when they
were cited by our articles
# a "may have moved here" table so we could add possibly moved here and
wayback options to {{deadlink}}.
# A bot to update links as sites reorganise and organisations rebrand,
without it we could be mostly deadlinked as early as mid-century.
#A bot that listed probable deaths based on obituaries in reliable sources
and even updates to subjects' own websites would also be useful.
# Possible breaches of our copyright would be another potential use, but
maybe we just need to rename "what links here" as "what links here
(internal)" and add "what links here (external)".
WSC
Message: 5
Date: Wed, 14 Sep 2011 17:09:44 +0200
From: Kim Bruning <kim(a)bruning.xs4all.nl>
Subject: Re: [Foundation-l] The WikiNews fork - for lack of a copyvio
detection bot half a project was lost
To: Wikimedia Foundation Mailing List
<foundation-l(a)lists.wikimedia.org>
Message-ID: <20110914170944.C22787(a)bruning.lan>
Content-Type: text/plain; charset=us-ascii
On Wed, Sep 14, 2011 at 10:49:06AM -0500, Aaron Adrignola wrote:
CorenSearchBot has not been operational for
several months since Yahoo
stopped allowing automated queries. Bing's terms of use don't permit
this either and apparently the same is true for Google.
It might be useful to have a community operated spider, then? In that way,
we could also optimize
our database for the kinds of queries we need.
sincerely,
Kim Bruning