[Foundation-l] Spiders and bots. Was "The WikiNews fork - for lack of a copyvio detection bot half a project was lost"

WereSpielChequers werespielchequers at gmail.com
Wed Sep 14 22:05:15 UTC 2011


I remember hearing a couple of times that CorenSearchBot was down, but just
assumed that something so important was being rescued, though I did wonder
slightly about the recent net increase in articles on EN wiki. 3,738,826
articles today means we've way overshot the 3 million projection, the 3.5
million prediction is looking distinctly cautious and and even the 4 million
by late 2012
http://commons.wikimedia.org/wiki/File:Enwikipediapercgrowth.PNG looks
somewhat unceiling like.

Could we get Google and Bing to make an exception for CorenSearchbot? If not
then I'd agree that a spider would make sense, though I've no idea what that
would cost. Having our own spider could be useful for other things though,
including:
# bot adding of {{deadlink}} templates.
# creating our own wayback machine showing webpages as they were when they
were cited by our articles
# a "may have moved here" table so we could add possibly moved here and
wayback options to {{deadlink}}.
# A bot to update links as sites reorganise and organisations rebrand,
without it we could be mostly deadlinked as early as mid-century.
#A bot that listed probable deaths based on obituaries in reliable sources
and even updates to subjects' own websites would also be useful.
# Possible breaches of our copyright would be another potential use, but
maybe we just need to rename "what links here" as "what links here
(internal)" and add "what links here (external)".

WSC

>
> Message: 5
> Date: Wed, 14 Sep 2011 17:09:44 +0200
> From: Kim Bruning <kim at bruning.xs4all.nl>
> Subject: Re: [Foundation-l] The WikiNews fork - for lack of a copyvio
>        detection bot half a project was lost
> To: Wikimedia Foundation Mailing List
>        <foundation-l at lists.wikimedia.org>
> Message-ID: <20110914170944.C22787 at bruning.lan>
> Content-Type: text/plain; charset=us-ascii
>
> On Wed, Sep 14, 2011 at 10:49:06AM -0500, Aaron Adrignola wrote:
> > CorenSearchBot has not been operational for several months since Yahoo
> > stopped allowing automated queries.  Bing's terms of use don't permit
> > this either and apparently the same is true for Google.
>
> It might be useful to have a community operated spider, then? In that way,
> we could also optimize
> our database for the kinds of queries we need.
>
> sincerely,
>        Kim Bruning
>
>
>
>
>


More information about the foundation-l mailing list