Steve Sanbeg wrote:
On Mon, 07 May 2007 17:12:52 -0400, Jay R. Ashworth wrote:
On Mon, May 07, 2007 at 03:51:48PM -0400, Gregory Maxwell wrote:
This would be very useful for another use case: Sometimes google will pick up a cached copy of a vandalized page. In order to purge the google cache you need to make the page 404 (which deletion doesn't do), put the page into a robots.txt deny, or include some directive in the page that stops indexing.
If we provided some directive to do one of the latter two (ideally the last) we could use it temporally to purge google cached copies of vandalism... so it would even be useful for pages that we normally want to keep indexed.
With all due respect to... oh, whomever the hell thinks they deserve some: aren't we big enough to get a little special handling from Google? I should think that if we have a page get cached that's either been vandalised or in some other way exposes us to liability, that as big as we are, and as many high ranked search results as we return on Google (we're often the top hit, and *very* often in the top 20), perhaps we might be able to access some *slightly* more prompt deindexing facility? At least for, say, our top 10 administators?
Cheers, -- jra
Doesn't this assume that:
- The foundation is willing to self censor its content.
Preventing google from scraping content at the request of BLP subjects is not censorship and sounds reasonable. It does not compromise wikipedia, just external engines creating biaed link summaries.
- Google will recognize that if a URL is marked like a crawler trap in
robots.txt that obviously isn't, it means that the corresponding censored article shouldn't be crawled or extracted from the syndication dumps. 3)The foundation wants to set up a private channel of information exclusively for Google.
insert noindex into the HTML output -- very easy and straightforward.
Jeff
Misusing robots.txt is somewhat dubious when you don't publish XML dumps for syndication, and seems somewhat pointless when you do. Having a team of censors maintaining a secret blacklist to be sent to one corporation seems somewhat contrary to the foundations goals.
There may be better ways to do it, but they wouldn't be as simple as adding a name to a file; and some may consider the ramifications of hiding an article like this to be more serious than deleting it, not less so.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l