Re: [Wikitech-l] What is required to "fix search"?

14 Apr 2006

On 4/14/06, Lars Aronsson &lt;lars(a)aronsson.se&gt; wrote:
...
  I'm not sure if you're talking about the big
web search engines
 (Google, Yahoo, MSN) or the search function in Mediawiki here.
 There is little excuse for the latter to have any delay.  But even
 for a big web search engine, it is easy to keep track of how often
 each webpage has changed in history, and economize how often it
 needs to be revisited.  Combined with the high PageRank of en.wp's
 RecentChanges (9 of 10), it would be trivial for Googlebot to
 revisit this page (or the front page of websites of major
 newspapers) every minute or two and make it a high priority to
 reindex all pages linked from there.  I suppose this is how Google
 News works.  Why it still takes about a month for Google to update
 its index on Wikipedia articles is a mystery to me.  Probably it
 has to do with a lack of competition.  If MSN or Yahoo were
 faster, it would force Google to improve. 
I know this was intentionally provocative, but I'll bite anyway.

As far as I know, the limitation in general on Google indexing more of
wikipedia is that wikipedia can't serve pages fast enough (or, more
accurately, the extra load of more Googlebot will make wikipedia
slower).

To answer your specific proposal:
1) http://en.wikipedia.org/wiki/Special:Recentchanges has a meta tag:
<meta name="robots" content="noindex,follow" />
which indicates it's explicitly disallowed from being crawled.
2) If it were allowed to be crawled, I'd expect it to be regularly
updated for the reasons you describe.  But even in that case, this
particular page changing rapidly is not an indicator that the target
pages are also changing rapidly.  For example, I imagine that the digg
front page changes pretty much every time a crawler visits, but the
pages linked *from* digg are not necessarily changing any more rapidly
than any other random page on the web is changing.

Instead, there is a way for webmasters and Google to cooperate: the
sitemaps program.  You can read more about it here:
https://www.google.com/webmasters/sitemaps/docs/en/about.html

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] What is required to "fix search"?