On Sun, 2005-11-20 at 18:39 -0800, Brion Vibber wrote:
Rob Lanphier wrote:
On Fri, 2005-11-18 at 19:09 +0000, Timwi wrote:
Speaking of which - this reminds me of an idea I had a while ago and I was wondering if anyone would be interested to hear this. Currently many Wikipedia pages in Google search results are redirects (for example, Google for "nonogram" and look at the seventh search result). I was wondering if there is a <link> element one could use to say that another URL is the "real" page? Then the page returned for a redirect's URL would tell search engines the URL of the page it's redirecting to.
I'm not aware of any <link> syntax, but one way to do it would be for MediaWiki to issue an HTTP 301 status (permanent redirect) to the new page, rather than returning 200 and giving the content. That probably introduces an unacceptably large performance penalty, though (extra round trip per request).
It's not a performance issue at all, and round-trips for 301s are often cheap compared to rendering.
...except for the fact that you are adding a round-trip in addition to subsequent rendering. I'll take your word for it that it's not a big deal in the larger scheme of things, but relative to a single header or tag, it seems pretty expensive (492 bytes inbound + 778 bytes outbound in the test I just ran with Firefox <=> standard config Apache).
It just makes it a lot harder to deal with such pages: if you HTTP-redirect straight to the target page you're missing the link back to the redirect page. (And that is *crucial* for editing work and vandalism cleanup. It is non-negotiable.)
If you redirect to an alternate URL which includes the linkback address, then a) it's an uglier URL and b) you don't get the alleged benefits of going to the single target URL in the first place.
We've actually discussed this many times before; please search the list archives if you wish to comment further. :)
I looked through the archives, and found the old "301's are evil" discussion from July 2003, which looks more like a misunderstanding than a productive conversation.
I'd like to point out that there's a third way, which is to set a cookie, rather than put the original request info in the URL. I'll admit that's probably got other problems, but I'm throwing that out there as a solution.
The "Content-Location" HTTP header is a potential longshot. I don't think Google documents their use/non-use of this header, but it's one of those "can't hurt" kind of things.
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.14
The spec is sufficiently vague and mysterious that I'd recommend against using it for any purpose.
Typical use is in content negotiation, allowing the server to advertise the direct URL to the content that was ultimately served as a result of the negotiation.
Since the destination page would not return the same HTML as the redirect page, it would likely be incorrect and might cause problems if anything does use it.
I suppose you're right. More importantly, there's little reason to believe that it'd actually solve the problem at hand. Now that I think about it, the search engines probably shouldn't imply that the content location header contains the better URL to use to access the content in question. Since they shouldn't, that means it's a bad thing to count on.
Rob