Hi,
I have noticed that google also have in his index the "talk:something" namespace and "user:something", "User talk:something" on the English and the Dutch wikipedia. The others i do not know.
And it also haves in his index the MetaWikipedia.
I find that google should not index our internall affairs. Only the aricles, nothing else. And that the MetaWikipedia is not especialy for the general public. I should also not be in the index of google.
Am I alone about this ?
Giskart
On Mon, 2002-12-16 at 12:38, giskart wrote:
I find that google should not index our internall affairs.
I disagree. Google is not a Wikipedia search engine, it's an Internet search engine. The goal is to collect and index the knowledge of the Net. Part of this "knowledge" are our user pages and even our discussions, which often have nothing to do with particular articles. Furthermore, people should discover through Google that Wikipedia is not just an encyclopedia but a community.
If people want to search only particular namespaces, they can use the Wikipedia search engine.
Regards, Erik
Eloquence wrote:
Giskart wrote:
I find that google should not index our internall affairs.
I disagree. Google is not a Wikipedia search engine, it's an Internet search engine. The goal is to collect and index the knowledge of the Net. Part of this "knowledge" are our user pages and even our discussions, which often have nothing to do with particular articles. Furthermore, people should discover through Google that Wikipedia is not just an encyclopedia but a community.
I agree. But it's also worth noting that there's probably no point in Google's indexing page histories, edit pages, what links here, user contributions, and the like.
-- Toby
On Mon, 2002-12-16 at 03:38, giskart wrote:
I have noticed that google also have in his index the "talk:something" namespace and "user:something", "User talk:something" on the English and the Dutch wikipedia. The others i do not know.
And it also haves in his index the MetaWikipedia.
I find that google should not index our internall affairs. Only the aricles, nothing else. And that the MetaWikipedia is not especialy for the general public. I should also not be in the index of google.
Am I alone about this ?
Our internal affairs aren't exactly internal -- it's an open, public discussion and everyone's welcome to participate. If Google helps people find the discussion, that's fine by me.
If something isn't meant for the general public, it shouldn't be put on a public web server with an open content license. ;) If you're afraid to have your name or your customary handle attached to your opinions in the public record, don't use your real name when you speak in public.
-- brion vibber (brion @ pobox.com)
Brion Vibber wrote:
Our internal affairs aren't exactly internal -- it's an open, public discussion and everyone's welcome to participate.
I know and understand that.
If Google helps people find the discussion, that's fine by me.
I have not really a problem whit the fact that google also finds the non-articles but I do that the talk pages sometimes haves a higher ranking then the articles. Those pages are competition whit the articels.
I only bring it up because I had the impression that a couple of months ago was decided to exclude the non-articles from the search machines.
If something isn't meant for the general public, it shouldn't be put on a public web server with an open content license. ;) If you're afraid to have your name or your customary handle attached to your opinions in the public record, don't use your real name when you speak in public.
I am not afraid about that. I stand after my (broken) words. My IRL-name is on my user page. I find that a sysop should show his/her/it real name.
-- brion vibber (brion @ pobox.com)
On Mon, 2002-12-16 at 10:49, Giskart wrote:
I have not really a problem whit the fact that google also finds the non-articles but I do that the talk pages sometimes haves a higher ranking then the articles. Those pages are competition whit the articels.
Then the article better be brought up to snuff so it gets a better ranking! ;)
I only bring it up because I had the impression that a couple of months ago was decided to exclude the non-articles from the search machines.
Not that I'm aware of. We exclude _edit pages_ since they're the equivalent of 404s; and we exclude some dynamically generated pages because spidering them puts extra load on the server as zillions of "show next XX items" links are followed.
-- brion vibber (brion @ pobox.com)
On Mon, Dec 16, 2002 at 11:03:06AM -0800, Brion Vibber wrote:
Not that I'm aware of. We exclude _edit pages_ since they're the equivalent of 404s; and we exclude some dynamically generated pages because spidering them puts extra load on the server as zillions of "show next XX items" links are followed.
Brion, what is the mechanism for telling Google not to follow a link? Pragma: no-cache? In mod_wiki, it would be cool to make "googleable" be an attribute of the page one could turn on and off.
Jonathan
On Mon, Dec 16, 2002 at 10:55:44AM -0800, Jonathan Walther wrote:
On Mon, Dec 16, 2002 at 11:03:06AM -0800, Brion Vibber wrote:
Not that I'm aware of. We exclude _edit pages_ since they're the equivalent of 404s; and we exclude some dynamically generated pages because spidering them puts extra load on the server as zillions of "show next XX items" links are followed.
Brion, what is the mechanism for telling Google not to follow a link? Pragma: no-cache? In mod_wiki, it would be cool to make "googleable" be an attribute of the page one could turn on and off.
robots.txt file is right solution. And googlability is property of classes of pages, not individual pages, so such atribute is misdesigned.
On Mon, 2002-12-16 at 10:55, Jonathan Walther wrote:
Brion, what is the mechanism for telling Google not to follow a link? Pragma: no-cache? In mod_wiki, it would be cool to make "googleable" be an attribute of the page one could turn on and off.
The cache obviously doesn't kill google, or our entire site would be missing from their index. ;)
What we use seems to be: <meta name="robots" content="noindex,nofollow">
Supposedly it works, though I've made no attempt to test it. (Our robots.txt blocks off everything that directly uses the /w/wiki.phtml link, ie everything but simple page views and the default views of special pages.)
-- brion vibber (brion @ pobox.com)
giskart wrote:
Hi,
I have noticed that google also have in his index the "talk:something" namespace and "user:something", "User talk:something" on the English and the Dutch wikipedia. The others i do not know.
And it also haves in his index the MetaWikipedia.
I find that google should not index our internall affairs. Only the aricles, nothing else. And that the MetaWikipedia is not especialy for the general public. I should also not be in the index of google.
Am I alone about this ?
You're not alone. Much of what happens on talk pages is pretty "wild-west". I think that we all want to continue that as part of the process for arriving at consensus on NPOV for articles.
Perhaps one of our more technically minded Wikipedians can respond about the technical feasibility of keeping talk pages out of the Google index.
Eclecticology
On Mon, 2002-12-16 at 10:34, Ray Saintonge wrote:
Perhaps one of our more technically minded Wikipedians can respond about the technical feasibility of keeping talk pages out of the Google index.
It would be very easy, but I don't think it's desireable to do so.
-- brion vibber (brion @ pobox.com)
wikitech-l@lists.wikimedia.org