Aren't almost all of the pages Google requests cached? At any rate, our robots.txt [1] makes no mention of this, so if this is the case we are either doing user-agent checking and throttling then dynamically or we have a negotiated agreement with them...Is this the case?
/Alterego
[1] http://en.wikipedia.org/robots.txt
On 9/3/05, Evan Martin evanm@google.com wrote:
On 9/2/05, Angela beesley@gmail.com wrote:
Google miscategorizes the language of some of the Hungarian Wikipedia pages. E.g. it thinks that our Adolf Hitler article is in czech.
How do you know they are miscategorising the language? <
http://www.google.com/search?q=inurl%3A%22Adolf+Hitler%22+site%3Ahu.wikipedi...
This makes it seem like they haven't indexed the page at all, not that they've marked it as the wrong language.
I believe this is correct. We've had problems in the past with overloading wikipedia so the crawl has been throttled way back. _______________________________________________ Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l