On Sat, Jan 23, 2010 at 4:31 AM, Carcharoth <carcharothwp(a)googlemail.com> wrote:
On Sat, Jan 23, 2010 at 3:21 AM, Gwern Branwen
<gwern0(a)gmail.com> wrote:
On Fri, Jan 22, 2010 at 8:45 PM, K. Peachey
<p858snake(a)yahoo.com.au> wrote:
On Sat, Jan 23, 2010 at 3:00 AM, Gwern Branwen
<gwern0(a)gmail.com> wrote:
...snip...
I started with all the links listed in
https://secure.wikimedia.org/wikipedia/en/wiki/Wikipedia:WikiProject_Anime_…
and then began running searches on random topics and pruning based on
that - chucking sites into the blacklist sinbin, or finding good sites
omitted from the list and adding them to the whitelist. At last count,
I had 200 sites on the nice list and 311 on the naughty list (but this
counts things like the Mirrors page as a single link, though they ban
dozens or hundreds of sites).
...snip...
Perhaps we should encourage more WikiProjects to create lists like the
one displayed then add them into a category and someone could work on
a custom search that suitable to use across the project that is
continuously updated with more allow/black lists.
-Peachey
That would be an excellent idea, especially if they could then all be
{{subst}}ed into a single page - just as I can ban every site listed
in the consolidated WP:MIRRO page, so too I can *include* every site
listed on a page. It would probably be superior to the current AfD
template with just some normal Google/Books/News searches.
Does your custom search aggregate books, news, and scholar searches,
as well as ordinary web searches?
I put in the Books/News/Scholar URLs, but I'm unsure it did anything.
For example, AFAIK, a site search of Google books will only turn up
the homepage for a book - the metadata, reviews, etc; the actual OCR
page contents are part of the 'deep web' you can get at only through
the actual Google search box. One might think that Google's custom
search might recognize the Google service URLs and run the deep web
queries and not just query the surface details - but that seems to be
too much to expect. (So I am perhaps a little hasty in suggesting a
universal CSE would replace the AfD searches.)
Those are the four Google searches I
use most often, and it is interesting to see how some subjects get
more coverage in one area of the information metasphere than other
areas. It is all quite logical when you think about when the topic
received most coverage. The one thing I still find that is lacking a
lot is Google News - a lots of old newspapers still seem to need to be
searched on separate databases. What is the best database out there
for searching in old newspapers?
Carcharoth
I don't know of any good non-proprietary old newspaper database, personally.
--
gwern