On Wednesday 03 December 2003 14:55, Jimmy Wales wrote:
I wrote:
>mmm, yummy. When will we get up the nerve to
turn full-text searching
>back on?
Andrew Alder wrote:
Is this even a good idea? I know everyone has
assumed we will, but the
current use of Google has its advantages too (see the Village Pump).
Or has this been fully discussed here already, long ago?
As for me, I always just assumed it. There are some big drawbacks to
google, namely that it isn't realtime, which makes doing certain kinds
of study difficult. Also, Michael Hardy has reported to me that one
page he used to find in Google can no longer be found in Google, due
presumably to the vagaries of Google indexing.
I have stumbled upon quite a few wikipedia pages that were not being indexed
by google, but the same pages on one of the many mirror-type (nationmaster,
etc.) sites were being indexed. My take on the whole situation is that google
is treating
en.wikipedia.org and
en2.wikipedia.org as two different entities.
Search for 'Rivers of France wikipedia'
(
http://www.google.com/search?hl=en&lr=&ie=UTF-8&oe=utf-8&q=…)
to see an example of both en and en2 competing for the top spot.
Example for knock-offs scoring higher than wikipedia: Search for 'Napoleonic
code'
(
http://www.google.com/search?q=Napoleonic+code&sourceid=mozilla-search&…)
where both
sciencedaily.com (IIRC a rather new mirror) and
nationmaster.com
rank higher than wikipedia. This could not be explained with the google
pagerank algorithm, because wikipedia surely gets a lot more quality and
quantity links than others do - But it would make sense if the wikipedia
ranking essentially gets divided by two.
This would seem to reduce traffic to wikipedia, which would obviously be a bad
thing. Is there some different load-balancing scheme that could be
implemented that would be transparent to google?
Best,
Sascha Noyes
--
Please encrypt all email. Public key available from
www.pantropy.net/snoyes.asc