Gentlemen, it is you who are ruining network standards.
HEAD http://en.wikipedia.org/wiki/Some_Non_Existent_Page --> 200 OK
It is clearly a case of 404 Not Found.
You can still give the same "You can create this article" message AND return a truthful HTTP code.
Else how is one to use a linkchecker to your links? Why are MediaWiki wikis special?
Yes, 200 OK for action=edit, and disambiguation pages, but not for the basic clear case of the spirit of 404 Not Found.
What if all ==External links== always returned 200? How could a bot detect linkrot? Do unto others...
On 26/01/2008, jidanni@jidanni.org jidanni@jidanni.org wrote:
Gentlemen, it is you who are ruining network standards.
HEAD http://en.wikipedia.org/wiki/Some_Non_Existent_Page --> 200 OK
It is clearly a case of 404 Not Found.
You can still give the same "You can create this article" message AND return a truthful HTTP code.
Else how is one to use a linkchecker to your links? Why are MediaWiki wikis special?
Yes, 200 OK for action=edit, and disambiguation pages, but not for the basic clear case of the spirit of 404 Not Found.
What if all ==External links== always returned 200? How could a bot detect linkrot? Do unto others...
This has come up before - if memory serves, the excuse given was that IE doesn't always show custom error pages (I think there is a minimum size, although I can't see how the Mediawiki pages would be shorter than that). It's something that bothers me as well - a page saying "page not found" really should return a 404 error code...
Thomas Dalton wrote:
This has come up before - if memory serves, the excuse given was that IE doesn't always show custom error pages (I think there is a minimum size, although I can't see how the Mediawiki pages would be shorter than that). It's something that bothers me as well - a page saying "page not found" really should return a 404 error code...
*nod*
See http://bugzilla.wikimedia.org/show_bug.cgi?id=2585
There were difficult to track down errors at the time we originally tried this, but we may have done it wrong (eg without ensuring the minimum object size for IE...) or else there may have been some sort of proxy issue that we didn't track down correctly.
Note that edit pages and view pages possibly should be handled differently, or possibly not.
There's additionally an old req in for 404 returns on Special:Export: http://bugzilla.wikimedia.org/show_bug.cgi?id=3161
That one is imho a more dubious case.
-- brion vibber (brion @ wikimedia.org)
Note that edit pages and view pages possibly should be handled differently, or possibly not.
I can't see anything that edit pages could return other than 200 OK, the issue is with redlinks, since they don't actually link to what you would expect, the article (which doesn't exist and should return 404), they link to the edit page (which does exist, so should return 200). Could redlinks somehow return a 307 Temporary Redirect to the edit page, rather than linking there directly? The issue is determining when to redirect to the edit page and when to simply return a 404 - could the referer be used for that? If you've followed a link from the wiki, it should go to the edit page, if you've got there any other way (link from external site or typing in the url) you should get the 404. I'm not sure if treating a URL differently based on the referer would be against the standards.
On Jan 26, 2008 9:41 AM, Thomas Dalton thomas.dalton@gmail.com wrote:
Note that edit pages and view pages possibly should be handled differently, or possibly not.
I can't see anything that edit pages could return other than 200 OK, the issue is with redlinks, since they don't actually link to what you would expect, the article (which doesn't exist and should return 404), they link to the edit page (which does exist, so should return 200). Could redlinks somehow return a 307 Temporary Redirect to the edit page, rather than linking there directly? The issue is determining when to redirect to the edit page and when to simply return a 404 - could the referer be used for that? If you've followed a link from the wiki, it should go to the edit page, if you've got there any other way (link from external site or typing in the url) you should get the 404. I'm not sure if treating a URL differently based on the referer would be against the standards.
Using referers isn't necessary. http://en.wikipedia.org/w/index.php?title=Gil_Prescott&action=edit is different from http://en.wikipedia.org/wiki/Gil_Prescot. Red links point to the former, which is clearly a 200 OK. A "link from external site or typing in the url" would presumably go to the latter.
I'm not sure that 404 is the correct response though. Wouldn't it be more correct for http://en.wikipedia.org/wiki/Gil_Prescot to return a temporary redirect to http://en.wikipedia.org/w/index.php?title=Gil_Prescott&action=edit?
The spec is really rather incomplete when it comes to dynamically generated pages. 200 OK isn't correct, but neither is 404 Not Found. What you'd really want is 2XX Dynamically Generated. 307 is technically correct, though.
Using referers isn't necessary. http://en.wikipedia.org/w/index.php?title=Gil_Prescott&action=edit is different from http://en.wikipedia.org/wiki/Gil_Prescot. Red links point to the former, which is clearly a 200 OK. A "link from external site or typing in the url" would presumably go to the latter.
But the "Edit this page" link points to the &action=edit too, but that certainly shouldn't return any kind of error code, since it does exactly what it says on the tin.
I'm not sure that 404 is the correct response though. Wouldn't it be more correct for http://en.wikipedia.org/wiki/Gil_Prescot to return a temporary redirect to http://en.wikipedia.org/w/index.php?title=Gil_Prescott&action=edit?
That would be doing away with the "No page by exists by this title" page completely - do we want to do that?
The spec is really rather incomplete when it comes to dynamically generated pages. 200 OK isn't correct, but neither is 404 Not Found. What you'd really want is 2XX Dynamically Generated. 307 is technically correct, though.
Really, you need a new section for Web 2.0 - at the moment, there is very little support for user generated content (there's "Created" and "Accepted" (202 and 203 give or take)). We need a "content requested" code, or something.
On 26/01/2008, Thomas Dalton thomas.dalton@gmail.com wrote:
Using referers isn't necessary. http://en.wikipedia.org/w/index.php?title=Gil_Prescott&action=edit is different from http://en.wikipedia.org/wiki/Gil_Prescot. Red links point to the former, which is clearly a 200 OK. A "link from external site or typing in the url" would presumably go to the latter.
But the "Edit this page" link points to the &action=edit too, but that certainly shouldn't return any kind of error code, since it does exactly what it says on the tin.
Hang on, I've missed your point slightly there. A redlink should not be a 200 OK, since it's a link to a page that doesn't exist (yes, it actually points to an edit page which does exist, but conceptually it's a broken link). A web crawler should see it as a broken link, it's only not broken if you're intending to contribute, not just read.
2008/1/26, Thomas Dalton thomas.dalton@gmail.com:
Hang on, I've missed your point slightly there. A redlink should not be a 200 OK, since it's a link to a page that doesn't exist (yes, it actually points to an edit page which does exist, but conceptually it's a broken link). A web crawler should see it as a broken link, it's only not broken if you're intending to contribute, not just read.
But how about edit pages of pages that do exist? Surely those are very similar to edit pages of pages that don't exist, but I don't see any logic in making THOSE 404s.
On 26/01/2008, Andre Engels andreengels@gmail.com wrote:
2008/1/26, Thomas Dalton thomas.dalton@gmail.com:
Hang on, I've missed your point slightly there. A redlink should not be a 200 OK, since it's a link to a page that doesn't exist (yes, it actually points to an edit page which does exist, but conceptually it's a broken link). A web crawler should see it as a broken link, it's only not broken if you're intending to contribute, not just read.
But how about edit pages of pages that do exist? Surely those are very similar to edit pages of pages that don't exist, but I don't see any logic in making THOSE 404s.
Exactly. That's where the problems come in. Both redlinks, "Edit this page" links and hand typed &action=edit urls go to the same page, but only the first ought to be a 404. I don't have a good solution...
2008/1/26, Thomas Dalton thomas.dalton@gmail.com:
Exactly. That's where the problems come in. Both redlinks, "Edit this page" links and hand typed &action=edit urls go to the same page, but only the first ought to be a 404. I don't have a good solution...
One easy solution would be to introduce a new action - e.g. "create" - that's entirely synomymous with "edit" except for the status code returned and that gets used on red links and red links only.
On 26/01/2008, Schneelocke schneelocke@gmail.com wrote:
2008/1/26, Thomas Dalton thomas.dalton@gmail.com:
Exactly. That's where the problems come in. Both redlinks, "Edit this page" links and hand typed &action=edit urls go to the same page, but only the first ought to be a 404. I don't have a good solution...
One easy solution would be to introduce a new action - e.g. "create" - that's entirely synomymous with "edit" except for the status code returned and that gets used on red links and red links only.
That ought to work.
On Jan 26, 2008 2:01 PM, Thomas Dalton thomas.dalton@gmail.com wrote:
On 26/01/2008, Thomas Dalton thomas.dalton@gmail.com wrote:
Using referers isn't necessary. http://en.wikipedia.org/w/index.php?title=Gil_Prescott&action=edit is different from http://en.wikipedia.org/wiki/Gil_Prescot. Red links point to the former, which is clearly a 200 OK. A "link from external site or typing in the url" would presumably go to the latter.
But the "Edit this page" link points to the &action=edit too, but that certainly shouldn't return any kind of error code, since it does exactly what it says on the tin.
Hang on, I've missed your point slightly there. A redlink should not be a 200 OK, since it's a link to a page that doesn't exist (yes, it actually points to an edit page which does exist, but conceptually it's a broken link). A web crawler should see it as a broken link, it's only not broken if you're intending to contribute, not just read.
Web crawlers should note that it is a page which is excluded by robots.txt. It's not a broken link, it's a valid link, just one which is not meant for robots.
User-agent: * Disallow: /w/
I'm not sure that 404 is the correct response though. Wouldn't it be more correct for http://en.wikipedia.org/wiki/Gil_Prescot to return a temporary redirect to http://en.wikipedia.org/w/index.php?title=Gil_Prescott&action=edit?
That would be doing away with the "No page by exists by this title" page completely - do we want to do that?
You're right, I was confused. I was thinking the two pages were the same. They're similar, for users that aren't logged in, but not the same.
I guess 404 would be the right result for http://en.wikipedia.org/wiki/Gil_Prescott.
As for http://en.wikipedia.org/w/index.php?title=Gil_Prescott&action=edit, I guess it depends if you think action=edit really means action=editorcreate. If it does (and de facto, it currently does), then logged in users should get a 200 and logged out users should get...a 403 Forbidden?
On 26/01/2008, Anthony wikimail@inbox.org wrote:
I'm not sure that 404 is the correct response though. Wouldn't it be more correct for http://en.wikipedia.org/wiki/Gil_Prescot to return a temporary redirect to http://en.wikipedia.org/w/index.php?title=Gil_Prescott&action=edit?
That would be doing away with the "No page by exists by this title" page completely - do we want to do that?
You're right, I was confused. I was thinking the two pages were the same. They're similar, for users that aren't logged in, but not the same.
I guess 404 would be the right result for http://en.wikipedia.org/wiki/Gil_Prescott.
As for http://en.wikipedia.org/w/index.php?title=Gil_Prescott&action=edit, I guess it depends if you think action=edit really means action=editorcreate. If it does (and de facto, it currently does), then logged in users should get a 200 and logged out users should get...a 403 Forbidden?
On wikis were anon page creation is blocked (which is far from every wiki - it may well be a minority), yes, I guess 403 would be correct (and for blocked users, anon users on wikis completely closed to anon editing, protected pages... only issue with some of those is that they're temporary, does 403 meant to be used for temporary restrictions?).
On Jan 26, 2008 3:48 PM, Thomas Dalton thomas.dalton@gmail.com wrote:
On wikis were anon page creation is blocked (which is far from every wiki - it may well be a minority), yes, I guess 403 would be correct (and for blocked users, anon users on wikis completely closed to anon editing, protected pages... only issue with some of those is that they're temporary, does 403 meant to be used for temporary restrictions?).
It doesn't seem like it: "The server understood the request, but is refusing to fulfill it. Authorization will not help and the request SHOULD NOT be repeated." It's still the closest fit, I guess, given no 500 errors, and such pages are in a minority anyway.
We could also do 201 for page creation. And of course, 409 is just perfect for edit conflicts! :)
On Jan 25, 2008 6:11 PM, Brion Vibber brion@wikimedia.org wrote:
Thomas Dalton wrote:
This has come up before - if memory serves, the excuse given was that IE doesn't always show custom error pages (I think there is a minimum size, although I can't see how the Mediawiki pages would be shorter than that). It's something that bothers me as well - a page saying "page not found" really should return a 404 error code...
*nod*
See http://bugzilla.wikimedia.org/show_bug.cgi?id=2585
There were difficult to track down errors at the time we originally tried this, but we may have done it wrong (eg without ensuring the minimum object size for IE...) or else there may have been some sort of proxy issue that we didn't track down correctly.
For what it's worth, LiveJournal's used the "huge HTML comment" hack for error pages for a long time without (as far as know) problems. Examine e.g. http://news.livejournal.com/friends/nonesuch -- hopefully you see the server-provided error text on all browsers. (But the user reach is much less than Wikipedia and it's not as important that this text is displayed as Wikipedia's is.)
Personally, I'd love it if Wikipedia had more informative error codes. My understanding is that Googlebot has use all sorts of heuristics to guess at whether a given page is "actually" a 404 or not, and heuristic guessing is what got us into the above mess in the first place.
You might find it amusing to use Code Search to search for the string "Wikipedia does not have an article with this exact name" to see others who've been confronted with this problem: http://www.google.com/codesearch?hl=en&lr=&q=%22Wikipedia+does+not+h... (Of course, none of those solutions work for non-English wikis...)
Another, more complicated instance of this is observable at http://www.google.com/search?q=nitty+gritty , where result #6 is: "Nitty gritty - Wikipedia, the free encyclopedia Wikipedia does not currently have an encyclopedia article for Nitty gritty. You may want to search Wiktionary for "Nitty gritty" instead. ..." If that were instead a redirect, Google could pick up the useful Wiktionary article rather than the useless Wikipedia page.
Another, more complicated instance of this is observable at http://www.google.com/search?q=nitty+gritty , where result #6 is: "Nitty gritty - Wikipedia, the free encyclopedia Wikipedia does not currently have an encyclopedia article for Nitty gritty. You may want to search Wiktionary for "Nitty gritty" instead. ..." If that were instead a redirect, Google could pick up the useful Wiktionary article rather than the useless Wikipedia page.
Yeah... that's far from ideal, but I don't think making them hard redirects would work well either, since it would make actually getting at the page within Wikipedia very difficult (unless you can get interwiki redirects to include a link back like intrawiki redirects do).
Another related issue is that redirects don't actually redirect - I certainly been confused at times by the URL in my address bar not actually being the URL for the page I'm looking at. Would it be good to actually use HTTP redirects? Possibly with a &redirectedfrom parameter in order to credit the link bank, if it can't be worked out from the referer or something. That would result in 2 requests to the servers rather than the current 1, but would allow links to be automatically updated by anything clever enough to know how to do that (for example, it should stop both the redirect and the real page appearing in Google with the same text, as I believe they do at the moment).
On Mon, Jan 28, 2008 at 09:08:58PM +0000, Thomas Dalton wrote:
Another related issue is that redirects don't actually redirect - I certainly been confused at times by the URL in my address bar not actually being the URL for the page I'm looking at. Would it be good to actually use HTTP redirects? Possibly with a &redirectedfrom parameter in order to credit the link bank, if it can't be worked out from the referer or something. That would result in 2 requests to the servers rather than the current 1, but would allow links to be automatically updated by anything clever enough to know how to do that (for example, it should stop both the redirect and the real page appearing in Google with the same text, as I believe they do at the moment).
Check this list's archives; early 07, I think. I posited that sugestion, and was carefully talked down; it took a few messages to convince me that "real" redirects were a Bad Idea, but they did, in fact, manage it.
Cheers, -- jra
wikitech-l@lists.wikimedia.org