Thomas R. Koll wrote:
On Mon, Sep 13, 2004 at 08:09:00AM +0200, Erik Moeller
wrote:
..even though it knows about them. Daniel Brandt
explains the problem
here:
http://www.google-watch.org/dying.html
Of course that article is hyperbole, but you can see the problem if you
search for articles on Wikipedia that do not include the word "MediaWiki"
(which occurs on every page), e.g.:
http://www.google.com/search?q=site%3Aen.wikipedia.org+-mediawiki
This lists the pages on
en.wikipedia.org that are not indexed, presently
255,000 (Google apparently has a list of URLs with no associated content).
That includes dynamic URLs, redirects and other duplicates, but it also
includes many pages that should be indexed. E.g. if I search for the text
of the English article "Potassium", I currently get plenty of mirrors, but
not Wikipedia itself.
I have a few questions, which someone with more Google-Fu than I may be
able to answer:
1) Why exactly doesn't Google index these articles?
first look at our robots.txt that will show you that edit-pages and such
aren't allowed to be searched. That's good so. The real articles that are
listed are mostly brand-new or badly-linked from other articles.
ciao, tom
When you look for [[nl:Oostvaardersplassen]] in Google you will not find
it. Using the wikipedia google search for it. You will however find an
entry for
<http://nl.wikipedia.org/wiki/Oostvaardersplassen> which is the article
in question however the format should have been like:
Oostvaardersplassen - Wikipedia NL <http://nl.wikipedia.org/wiki/Veluwe>
In the local ratings among the articles that contain the word
"Oostvaardersplassen" it has a low rating even though it is *the*
article about the subject.
This article is not new, there are no problems with the linking that I
can see. It has 39 references and was started on 21 jan 2004.
This information is just to show that there ARE problems with Google and
that it is not just with new and badly linked articles.
Thanks,
GerardM