hi,
in 1.5, we have the possibility of changing URLs used for projects.
previously, there were two forms of URLs:
- http://en.wikipedia.org/wiki/Main_Page - http://en.wikipedia.org/w/index.php?title=Main_Page&action=edit
we now have the choice to replace "action=" with a new path:
- http://en.wikipedia.org/wiki/Main_Page - http://en.wikipedia.org/edit/Main_Page
however, at the same time, we could implement a change in the way domain names are used, that has been discussed several times in the past:
- http://wikipedia.org/en/article/Main_Page - http://wikipedia.org/en/edit/Main_Page
as well as being nicer from a technical perspective, i think this reduces the disconnection between different languages of the same project. so, i'd like to propose that we move to the last form of URLs at some point.
does anyone have comments on or object to this change?
kate.
however, at the same time, we could implement a change in the way domain names are used, that has been discussed several times in the past:
- http://wikipedia.org/en/article/Main_Page - http://wikipedia.org/en/edit/Main_Page
This isn't an objection since Alexa isn't accurate anyway, but this would prevent the "Where do people go on wikipedia.org?" at http://alexa.com/data/details/traffic_details?&url=wikipedia.org from working.
I'm not sure what other examples there are of a subdomain having any relevance to external sites. Google Adsense used to give more information if you split your site by subdomains, but now lets you track directories as well as subdomains. Not that that would be a relevant point for Wikimedia projects, but are there any other external stats that people use that do need Wikipedia to have subdomains?
Angela.
Is there a reason to not use a www in that url schema? If this change is made, you effectively have to support both, as there are hundreds of thousands of inbound links across all the projects. Easy enough to rewrite URLs, but not easy to change those links, or the mindset of those who have been using the lang.wikipedia.org http://lang.wikipedia.org schema for years.
/Alterego
On 7/7/05, Angela beesley@gmail.com wrote:
however, at the same time, we could implement a change in the way domain names are used, that has been discussed several times in the past:
This isn't an objection since Alexa isn't accurate anyway, but this would prevent the "Where do people go on wikipedia.org?http://wikipedia.org?" at http://alexa.com/data/details/traffic_details?&url=wikipedia.org from working.
I'm not sure what other examples there are of a subdomain having any relevance to external sites. Google Adsense used to give more information if you split your site by subdomains, but now lets you track directories as well as subdomains. Not that that would be a relevant point for Wikimedia projects, but are there any other external stats that people use that do need Wikipedia to have subdomains?
Angela. _______________________________________________ Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Angela wrote:
however, at the same time, we could implement a change in the way domain names are used, that has been discussed several times in the past:
- http://wikipedia.org/en/article/Main_Page - http://wikipedia.org/en/edit/Main_Page
This isn't an objection since Alexa isn't accurate anyway, but this would prevent the "Where do people go on wikipedia.org?" at http://alexa.com/data/details/traffic_details?&url=wikipedia.org from working.
I'm not sure what other examples there are of a subdomain having any relevance to external sites. Google Adsense used to give more information if you split your site by subdomains, but now lets you track directories as well as subdomains. Not that that would be a relevant point for Wikimedia projects, but are there any other external stats that people use that do need Wikipedia to have subdomains?
Personally, I like being able to Google-search with parameters like "site:en.wikipedia.org" to return results only from one wikipedia.
That won't change. Google's site search work's for directories as well e.g, site:www.wikipedia.org/en/article/ http://www.wikipedia.org/en/article/http://wikipedia.org/en/article/Main_Page
On 7/7/05, Andrew Venier avenier@venier.net wrote:
Angela wrote:
however, at the same time, we could implement a change in the way domain names are used, that has been discussed several times in the past:
This isn't an objection since Alexa isn't accurate anyway, but this would prevent the "Where do people go on wikipedia.org?http://wikipedia.org?"
at
http://alexa.com/data/details/traffic_details?&url=wikipedia.org from working.
I'm not sure what other examples there are of a subdomain having any relevance to external sites. Google Adsense used to give more information if you split your site by subdomains, but now lets you track directories as well as subdomains. Not that that would be a relevant point for Wikimedia projects, but are there any other external stats that people use that do need Wikipedia to have subdomains?
Personally, I like being able to Google-search with parameters like "site:en.wikipedia.org http://en.wikipedia.org" to return results only from one wikipedia. _______________________________________________ Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
"Kate" keturner@livejournal.com wrote in message news:6019986.1hDDKoznoz@rose.local...
in 1.5, we have the possibility of changing URLs used for projects.
[examples snipped]
For the sake of lowering blood-pressure all around, should be assume that these new formats are **in addition** to those currently in use?
I like them, but I echo the concerns of those worried about breaking inward links.
Actually I like the possiblity of specifying links in more than one way to allow maximal flexibility. If we **can** handle it, then we **should** handle it. The principle of "least surprise" should come into pre-emptive action here.
Phil Boswell wrote in gmane.science.linguistics.wikipedia.technical:
"Kate" keturner@livejournal.com wrote in message news:6019986.1hDDKoznoz@rose.local...
in 1.5, we have the possibility of changing URLs used for projects.
[examples snipped]
For the sake of lowering blood-pressure all around, should be assume that these new formats are **in addition** to those currently in use?
old URLs will not stop working. that would be bad, of course, given the number of links to the current URLs, both internally and externally...
(however, using the old URLs would likely redirect to the newer style, were it implemented.)
kate.
Phil Boswell wrote:
"Kate" keturner@livejournal.com wrote in message news:6019986.1hDDKoznoz@rose.local...
in 1.5, we have the possibility of changing URLs used for projects.
[examples snipped]
For the sake of lowering blood-pressure all around, should be assume that these new formats are **in addition** to those currently in use?
I like them, but I echo the concerns of those worried about breaking inward links.
Just to emphasize:
We are *strongly* committed to maintaining the functioning of existing incoming links. I've made this promise several times over the last three years, and I'm happy to make it again.
If and when we change URL styles (as in the past moving from *.wikipedia.com to *.wikipedia.org, from www.wikipedia.org to en.wikipedia.org, from /wiki.cgi?Title to /wiki/Title, and from /w/wiki.phtml to /w/index.php) the old links will continue to work by automatic redirect or just by both working at once.
Actually I like the possiblity of specifying links in more than one way to allow maximal flexibility. If we **can** handle it, then we **should** handle it. The principle of "least surprise" should come into pre-emptive action here.
There needs however to be a 'canonical' form: this is what the software itself will produce links as, and for cachable items where a single purgeable URL needs to exist and redirects are used, that's where other forms will get redirected to.
-- brion vibber (brion @ pobox.com)
Brion Vibber wrote:
We are *strongly* committed to maintaining the functioning of existing incoming links. I've made this promise several times over the last three years, and I'm happy to make it again.
And of course I second it.
I don't currently support the idea of changing URL styles, but I could be convinced, and anyway I don't feel that it's the sort of decision that should be left up to me.
But one thing that absolutely comes from the top here is that "We are *strongly* committed to maintaining the functioning of existing incoming links." It would be a disaster not to do this.
--Jimbo
Kate wrote:
- http://en.wikipedia.org/edit/Main_Page
Looks nice.
- http://wikipedia.org/en/article/Main_Page - http://wikipedia.org/en/edit/Main_Page
as well as being nicer from a technical perspective, i think this reduces the disconnection between different languages of the same project. so, i'd like to propose that we move to the last form of URLs at some point.
It will add another level of complexity the day we want to split the cluster by languages (like en: hosted in florida, ja: hosted in japan, fr: hosted in europe).
wikipedia.org would probably point to a redirector, if it fails all languages would be impacted.
Even if the change is made, the worse is that we still have to support the old lang.project.org schema. So we will have to handle two systems :(
Looks useless.
Ashar Voultoiz wrote:
It will add another level of complexity the day we want to split the cluster by languages (like en: hosted in florida, ja: hosted in japan, fr: hosted in europe).
We're already doing that based on source IP, and may even start doing that based on network topology (BGP info). Splitting based on subdomain is not likely to be needed.
wikipedia.org would probably point to a redirector, if it fails all languages would be impacted.
No, things would stay the same.
Even if the change is made, the worse is that we still have to support the old lang.project.org schema. So we will have to handle two systems :(
Looks useless.
That's just a matter of rewrite rules.
Kate wrote:
we now have the choice to replace "action=" with a new path:
- http://en.wikipedia.org/wiki/Main_Page - http://en.wikipedia.org/edit/Main_Page
however, at the same time, we could implement a change in the way domain names are used, that has been discussed several times in the past:
- http://wikipedia.org/en/article/Main_Page - http://wikipedia.org/en/edit/Main_Page
Will "article", "edit", etc., be localizable?
Thanks.
Carlos wrote:
we now have the choice to replace "action=" with a new path:
- http://en.wikipedia.org/wiki/Main_Page - http://en.wikipedia.org/edit/Main_Page
however, at the same time, we could implement a change in the way domain names are used, that has been discussed several times in the past:
- http://wikipedia.org/en/article/Main_Page - http://wikipedia.org/en/edit/Main_Page
Will "article", "edit", etc., be localizable?
The "edit" button in the UI already is ;-)
I don't think URLs need to be localisable, especially not URLs that only geeks will want to go to directly.
Timwi
Ævar Arnfjörð Bjarmason wrote:
does anyone have comments on or object to this change?
I like the current urls, a subdomain looks much better and is easier to remember than a subdirectory.
Well that depends on who you ask, I for one think the opposite :)
Kate schreef: [knip]
however, at the same time, we could implement a change in the way domain names are used, that has been discussed several times in the past:
- http://wikipedia.org/en/article/Main_Page - http://wikipedia.org/en/edit/Main_Page
as well as being nicer from a technical perspective, i think this reduces the disconnection between different languages of the same project. so, i'd like to propose that we move to the last form of URLs at some point.
does anyone have comments on or object to this change?
kate.
Please do not change anything like this whit out an very serious project wide consultation about this. Only a posting and some responds on this mailing list (or any other list) should not result in such a very important change.
I, and i strongly believe there are many others, see "Wikipedia" as the global name for all the Wikipedias but have there own home Wikipedia. Because every language Wikipedia is editorial independent and has it own community and the identify there Wikipedia whit there own domain. The are proud of there own Wikipedia.
It is not because you are part of a bigger community that you should lose your identity. This is a very sensitive an important thing. There are other things to consider besides the technical side of it.
I'd be for replacing the action links with subdirectories, but against putting all language wikis in the same subdomain, because I think that's much more of a "cultural" change than changing the URLs for edit pages and so on.
Also, why can't articles just be at the root?
so:
http://en.wikipedia.org/Main_Page which is equivalent to http://en.wikipedia.org/wiki/Main_Page and http://en.wikipedia.org/article/Main_Page
but also: http://en.wikipedia.org/edit/Main_Page http://en.wikipedia.org/history/Main_Page and hopefully... http://en.wikipedia.org/delete/Main_Page http://en.wikipedia.org/protect/Main_Page http://en.wikipedia.org/move/Main_Page etc.
- David
Kate wrote:
hi,
in 1.5, we have the possibility of changing URLs used for projects.
previously, there were two forms of URLs:
- http://en.wikipedia.org/wiki/Main_Page - http://en.wikipedia.org/w/index.php?title=Main_Page&action=edit
we now have the choice to replace "action=" with a new path:
- http://en.wikipedia.org/wiki/Main_Page - http://en.wikipedia.org/edit/Main_Page
however, at the same time, we could implement a change in the way domain names are used, that has been discussed several times in the past:
- http://wikipedia.org/en/article/Main_Page - http://wikipedia.org/en/edit/Main_Page
as well as being nicer from a technical perspective, i think this reduces the disconnection between different languages of the same project. so, i'd like to propose that we move to the last form of URLs at some point.
does anyone have comments on or object to this change?
kate.
I must agree -- I don't think we should move away from the subdomain model.
Mark
On 07/07/05, David Friedland david@nohat.net wrote:
I'd be for replacing the action links with subdirectories, but against putting all language wikis in the same subdomain, because I think that's much more of a "cultural" change than changing the URLs for edit pages and so on.
Also, why can't articles just be at the root?
so:
http://en.wikipedia.org/Main_Page which is equivalent to http://en.wikipedia.org/wiki/Main_Page and http://en.wikipedia.org/article/Main_Page
but also: http://en.wikipedia.org/edit/Main_Page http://en.wikipedia.org/history/Main_Page and hopefully... http://en.wikipedia.org/delete/Main_Page http://en.wikipedia.org/protect/Main_Page http://en.wikipedia.org/move/Main_Page etc.
- David
Kate wrote:
hi,
in 1.5, we have the possibility of changing URLs used for projects.
previously, there were two forms of URLs:
- http://en.wikipedia.org/wiki/Main_Page - http://en.wikipedia.org/w/index.php?title=Main_Page&action=edit
we now have the choice to replace "action=" with a new path:
- http://en.wikipedia.org/wiki/Main_Page - http://en.wikipedia.org/edit/Main_Page
however, at the same time, we could implement a change in the way domain names are used, that has been discussed several times in the past:
- http://wikipedia.org/en/article/Main_Page - http://wikipedia.org/en/edit/Main_Page
as well as being nicer from a technical perspective, i think this reduces the disconnection between different languages of the same project. so, i'd like to propose that we move to the last form of URLs at some point.
does anyone have comments on or object to this change?
kate.
Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
On Thu, 2005-07-07 at 22:49 -0700, David Friedland wrote:
Also, why can't articles just be at the root?
so:
http://en.wikipedia.org/Main_Page which is equivalent to http://en.wikipedia.org/wiki/Main_Page and http://en.wikipedia.org/article/Main_Page
Because then you sorta limit your extensibility, putting articles in the same namespace as your "api". Imagine: http://en.wikipedia.org/history -> article on "History" http://en.wikipedia.org/history/protect -> history of the article "Protect" http://en.wikipedia.org/protect/history -> protect the article "History" ...or is it the "History" subpage of the "Protect" article?
The rewrite rules get much simpler if the syntax is: http://<lang>.wikipedia.org/<action>/<article>
...and you have more room to grow without breaking URL permanence, because you haven't called dibbs on too many top-level path segments.
but also: http://en.wikipedia.org/edit/Main_Page http://en.wikipedia.org/history/Main_Page and hopefully... http://en.wikipedia.org/delete/Main_Page http://en.wikipedia.org/protect/Main_Page http://en.wikipedia.org/move/Main_Page etc.
This is a good idea. It'd be really nice to get all of the technology out of the URLs (e.g. ".php" file extensions) so that maintaining URL permanence doesn't mean maintaining strange mappings, even if MediaWiki is rewritten in a completely different language one day.
Rob
Rob Lanphier wrote:
Because then you sorta limit your extensibility, putting articles in the same namespace as your "api". Imagine: http://en.wikipedia.org/history -> article on "History" http://en.wikipedia.org/history/protect -> history of the article "Protect" http://en.wikipedia.org/protect/history -> protect the article "History" ...or is it the "History" subpage of the "Protect" article?
The rewrite rules get much simpler if the syntax is: http://<lang>.wikipedia.org/<action>/<article>
...and you have more room to grow without breaking URL permanence, because you haven't called dibbs on too many top-level path segments.
It seems to me that optimizing for the most common case is much more important than any of these corner case concerns. The extremely rare cases of namespace overlap can be handled specially. On en.wikipedia, of course, this would never be a problem because article titles are in title case, and commands are in lower case.
- David
David Friedland wrote:
Rob Lanphier wrote:
Because then you sorta limit your extensibility, putting articles in the same namespace as your "api". Imagine: http://en.wikipedia.org/history -> article on "History" http://en.wikipedia.org/history/protect -> history of the article "Protect" http://en.wikipedia.org/protect/history -> protect the article "History" ...or is it the "History" subpage of the "Protect" article?
The rewrite rules get much simpler if the syntax is: http://<lang>.wikipedia.org/<action>/<article>
...and you have more room to grow without breaking URL permanence, because you haven't called dibbs on too many top-level path segments.
It seems to me that optimizing for the most common case is much more important than any of these corner case concerns.
I sit at the other end of the bug reports, and I can assure you the opposite is true.
The current URLs *work* and with little more than aesthetic preference for a change, a change that actually is KNOWN to fail in cases that are currently handled correctly is just not going to happen.
Note that Wikipedias can be fully expected to have articles on [[robots.txt]] and [[favicon.ico]] etc... :) We *must* have a clear namespace, unfettered by exceptions or no-title-zones. This is non-negotiable.
The extremely rare cases of namespace overlap can be handled specially.
Handling things "specially" is a recipe for disaster, and generally means "there will be a bunch of big nasty bugs here". It's far, far better to handle these things generically by eliminating the bug breeding ground.
Pretty much any and all "rare" cases will be found as people try to put articles at them and run up against a brick wall.
On en.wikipedia, of course, this would never be a problem because article titles are in title case, and commands are in lower case.
This argument fails for a number of reasons, including:
* On wikis with capitalized titles we accept incoming links with both lower- and uppercase forms, with automatic redirecting where necessary. Changing this may break existing links, including interwiki links, unnecessarily. Note that interwiki links do not force the capital letter on the outgoing link since it's unknown whether the software on the other end is case-sensitive or not.
* At some point we *will* move to a fully case-insensitive, case-preserving title system. There *will* be titles with initial lowercase letters on the English Wikipedia at that time.
* Right now on a number of our wikis, link capitalization is not enforced and there *are now* titles with initial lowercase letters. These include a fair number of major Wiktionaries and a couple artificial-language Wikipedias.
-- brion vibber (brion @ pobox.com)
Kate wrote:
- http://en.wikipedia.org/wiki/Main_Page - http://en.wikipedia.org/w/index.php?title=Main_Page&action=edit - http://en.wikipedia.org/wiki/Main_Page - http://en.wikipedia.org/edit/Main_Page - http://wikipedia.org/en/article/Main_Page - http://wikipedia.org/en/edit/Main_Page
Here is my personal opinion. I don't expect many people to agree, but I would like it this way most.
I want the "edit" to be at the *end* of the URL, not somewhere in the middle. That makes it easiest to add to (or remove from) an existing URL. (I also want the "wiki" to go away. :) )
Hence, the article [[History]] would have:
http://en.wikipedia.org/History http://en.wikipedia.org/History/edit http://en.wikipedia.org/History/history
Article titles with a slash in them, such as [[History/edit]] if anyone ever wanted to create it, could be encoded using a double-slash:
http://en.wikipedia.org/History//edit http://en.wikipedia.org/History//edit/edit http://en.wikipedia.org/History//edit/history
This is less alienating than "%28" for parentheses or "%2C" for commas, but of course one could always use "%2F" for slashes for consistency.
URLs such as http://en.wikipedia.org/Pagename/subpage would then redirect to http://en.wikipedia.org/Pagename//subpage and you could still type a single slash as long as the subpage doesn't happen to be "edit", "history", or any of the other magic words.
Since these things are just URLs, I don't believe things like "/edit" need to be internationalised.
Greetings, Timwi
On Fri, 2005-07-08 at 20:56 +0100, Timwi wrote:
Article titles with a slash in them, such as [[History/edit]] if anyone ever wanted to create it, could be encoded using a double-slash:
http://en.wikipedia.org/History//edit http://en.wikipedia.org/History//edit/edit http://en.wikipedia.org/History//edit/history
This is less alienating than "%28" for parentheses or "%2C" for commas, but of course one could always use "%2F" for slashes for consistency.
I'm not sure if there's a specific prohibition of this practice in any spec, but it does fight typical conventions, which is kind of a bad thing. For example, it appears that Apache throws away extra slashes, as can be seen here: http://apache.org///foundation////faq.html http://apache.org/foundation/faq.html
IIS seems to do the same thing: http://www.microsoft.com////windowsserversystem///default.mspx
Chances are there are going to be weird corner cases where the web server smarts interfere with what you want to do. While it seems like a simple enough problem to solve for the Wikipedia admin team, it's likely that such a convention would be difficult to support in a general MediaWiki install. Though there's no reason that WP can't have a custom, tricked-out config, it'd be nice if WP were running in a "recommended" configuration for MediaWiki.
URLs such as http://en.wikipedia.org/Pagename/subpage would then redirect to http://en.wikipedia.org/Pagename//subpage and you could still type a single slash as long as the subpage doesn't happen to be "edit", "history", or any of the other magic words.
Mixing subpages into the action namespace seems like a bad idea. Arguably its worse when its only the exception, because that means it'll be something that will always need to be accommodated, but rare enough that its often forgotten.
In the spirit of putting the action at the end, perhaps this syntax would work: http://en.wikipedia.org/Pagename?action=edit http://en.wikipedia.org/Pagename?action=history
It seems a little dicey from a future-proofing perspective to put the article names at the root, but there is an appeal to it as well. In order to avoid mixing functionality extensions into the article namespace, the extensions would either need to live in URL parameters or in the 'Special:' namespace (or other namespace).
Rob
Rob Lanphier wrote:
On Fri, 2005-07-08 at 20:56 +0100, Timwi wrote:
Article titles with a slash in them, such as [[History/edit]] if anyone ever wanted to create it, could be encoded using a double-slash:
http://en.wikipedia.org/History//edit http://en.wikipedia.org/History//edit/edit http://en.wikipedia.org/History//edit/history
This is less alienating than "%28" for parentheses or "%2C" for commas, but of course one could always use "%2F" for slashes for consistency.
I'm not sure if there's a specific prohibition of this practice in any spec,
No, there isn't.
but it does fight typical conventions, which is kind of a bad thing. For example, it appears that Apache throws away extra slashes, as can be seen here: http://apache.org///foundation////faq.html http://apache.org/foundation/faq.html IIS seems to do the same thing: http://www.microsoft.com////windowsserversystem///default.mspx
I assure you that Apache does not throw away extra slashes. I have already done the necessary programming to do URLs such as those I have mentioned. The examples you mentioned don't say anything about the webservers themselves because both URLs are obviously mappings to a filesystem (whether virtual or not); it is that filesystem that throws away the extra slashes (which you can easily test: Both Linux and Windows allow you to put double-/ resp. double-\ in a path and it won't complain).
Compare: http://www.livejournal.com/manage/index.bml and http://www.livejournal.com/manage//index.bml
They show the same page because the path is a mapping to a filesystem, but the pages are different because the individual strings on it are retrieved from codes that are based on the path. Those codes contain only single slashes, so the second page is missing those strings. This clearly shows that it's the filesystem and not Apache that "throws away" double-slashes.
Mixing subpages into the action namespace seems like a bad idea. Arguably its worse when its only the exception, because that means it'll be something that will always need to be accommodated, but rare enough that its often forgotten.
I don't understand what you're saying here. I'm not "mixing" anything. The rest doesn't seem to make any sense. Can you rephrase this?
In the spirit of putting the action at the end, perhaps this syntax would work: http://en.wikipedia.org/Pagename?action=edit http://en.wikipedia.org/Pagename?action=history
Of course it would "work", but it's not what I want because "?action=" is a pain to type.
Timwi
On Fri, 2005-07-08 at 22:40 +0100, Timwi wrote:
Rob Lanphier wrote:
On Fri, 2005-07-08 at 20:56 +0100, Timwi wrote:
Article titles with a slash in them, such as [[History/edit]] if anyone ever wanted to create it, could be encoded using a double-slash:
http://en.wikipedia.org/History//edit http://en.wikipedia.org/History//edit/edit http://en.wikipedia.org/History//edit/history
This is less alienating than "%28" for parentheses or "%2C" for commas, but of course one could always use "%2F" for slashes for consistency.
I'm not sure if there's a specific prohibition of this practice in any spec,
No, there isn't.
Ok, that's good. What about the ramifications for relative URL handling in RFC 2396? http://www.ietf.org/rfc/rfc2396
I haven't found any immediate problems, but it would take me a while reading through the BNF to figure out if there are places where it breaks.
Not that it's a huge deal if relative URLs don't work, since MW can always just stick to absolute references, but its one area where things can go wrong.
but it does fight typical conventions, which is kind of a bad thing. For example, it appears that Apache throws away extra slashes, as can be seen here: http://apache.org///foundation////faq.html http://apache.org/foundation/faq.html IIS seems to do the same thing: http://www.microsoft.com////windowsserversystem///default.mspx
I assure you that Apache does not throw away extra slashes. I have already done the necessary programming to do URLs such as those I have mentioned. The examples you mentioned don't say anything about the webservers themselves because both URLs are obviously mappings to a filesystem (whether virtual or not); it is that filesystem that throws away the extra slashes (which you can easily test: Both Linux and Windows allow you to put double-/ resp. double-\ in a path and it won't complain).
Compare: http://www.livejournal.com/manage/index.bml and http://www.livejournal.com/manage//index.bml
They show the same page because the path is a mapping to a filesystem, but the pages are different because the individual strings on it are retrieved from codes that are based on the path. Those codes contain only single slashes, so the second page is missing those strings. This clearly shows that it's the filesystem and not Apache that "throws away" double-slashes.
Ok, that's good.
Still, I maintain that assigning unique semantics to "//" versus "/" when used in that part of a URL doesn't have a lot of precedent, which also means that there's probably a lot of places it can break. I admit that's a vague criticism, but I just have a bad gut feeling about going down that road.
Mixing subpages into the action namespace seems like a bad idea. Arguably its worse when its only the exception, because that means it'll be something that will always need to be accommodated, but rare enough that its often forgotten.
I don't understand what you're saying here. I'm not "mixing" anything. The rest doesn't seem to make any sense. Can you rephrase this?
First, let me quote again the part of your proposal which I was responding to, and try again with a clearer response:
URLs such as http://en.wikipedia.org/Pagename/subpage would then redirect to http://en.wikipedia.org/Pagename//subpage and you could still type a single slash as long as the subpage doesn't happen to be "edit", "history", or any of the other magic words.
Let's say there's an article called "Particles", with a sub-article "Atom".
The "Particle" article would be: http://en.wikipedia.org/Particles
...and the Atom subarticle would be: http://en.wikipedia.org/Particles/Atom
Now lets say that a future version of MW allows output of the history of any page in Atom form, with "/Atom" designating the output.
So, after the upgrade, what used to be a valid reference to the Atom article is now an atom feed. The new URL to the article is: http://en.wikipedia.org/Particles//Atom
But wait, we already committed to not break incoming links. Crap...now we have to choose a name for every new action that doesn't overlap with any existing subpage.
This is what I mean about keeping namespaces separate ("namespace" in the general sense, not the MediaWiki specific "Foo:Bar" sense). I would much prefer there be a syntactic way of distinguishing between a subpage and an action that always works, even on Tuesdays ;-)
In the spirit of putting the action at the end, perhaps this syntax would work: http://en.wikipedia.org/Pagename?action=edit http://en.wikipedia.org/Pagename?action=history
Of course it would "work", but it's not what I want because "?action=" is a pain to type.
...but it's far less ambiguous, and it's clear that it's an action performed on article "Pagename", rather than a page of its own.
Rob
Just a couple of comments on this
Rob Lanphier wrote:
Ok, that's good. What about the ramifications for relative URL handling in RFC 2396? http://www.ietf.org/rfc/rfc2396
I haven't found any immediate problems, but it would take me a while reading through the BNF to figure out if there are places where it breaks.
Not that it's a huge deal if relative URLs don't work, since MW can always just stick to absolute references, but its one area where things can go wrong.
Just a comment in passing: RFC 3986 obsoletes RFC 2396
http://www.ietf.org/rfc/rfc3986.txt
The one specific prohibition of double slash in the path is at the start of path when the URI has no authority segment.
but it does fight typical conventions, which is kind of a bad thing. For example, it appears that Apache throws away extra slashes, as can be seen here: http://apache.org///foundation////faq.html http://apache.org/foundation/faq.html IIS seems to do the same thing: http://www.microsoft.com////windowsserversystem///default.mspx
I assure you that Apache does not throw away extra slashes. I have already done the necessary programming to do URLs such as those I have mentioned. The examples you mentioned don't say anything about the webservers themselves because both URLs are obviously mappings to a filesystem (whether virtual or not); it is that filesystem that throws away the extra slashes (which you can easily test: Both Linux and Windows allow you to put double-/ resp. double-\ in a path and it won't complain).
Compare: http://www.livejournal.com/manage/index.bml and http://www.livejournal.com/manage//index.bml
They show the same page because the path is a mapping to a filesystem, but the pages are different because the individual strings on it are retrieved from codes that are based on the path. Those codes contain only single slashes, so the second page is missing those strings. This clearly shows that it's the filesystem and not Apache that "throws away" double-slashes.
Ok, that's good.
Still, I maintain that assigning unique semantics to "//" versus "/" when used in that part of a URL doesn't have a lot of precedent, which also means that there's probably a lot of places it can break. I admit that's a vague criticism, but I just have a bad gut feeling about going down that road.
FWIW, I share your bad gut feeling. Superfluous slashes in the path may be used as an evasion technique, and some security packages (for example, mod_security for Apache) normalize the path, stripping out the extra slashes. While this particular case might not pertain to our installation at this time, it is an example of the kind of unforseen problems that may lie down this road.
Rob Lanphier wrote:
Let's say there's an article called "Particles", with a sub-article "Atom".
The "Particle" article would be: http://en.wikipedia.org/Particles
....and the Atom subarticle would be: http://en.wikipedia.org/Particles/Atom
I see you misunderstood. My proposal isn't to double the slash only in cases where it disambiguates an action. That would be stupid precisely for the reasons you mentioned. No, the canonical URL for [[a/b/c/d]] would be at http://en.wikipedia.org/A//b//c//d under my proposal.
But anyway. I understand everybody's concerns about the double-slashes, and they aren't a required portion of my proposal anyway, so we can change this. You could always encode them as %2F instead, or you could use ";edit" instead of "/edit" and then encode semicolons as %3B instead (the software already does this now).
In the spirit of putting the action at the end, perhaps this syntax would work: http://en.wikipedia.org/Pagename?action=edit http://en.wikipedia.org/Pagename?action=history
Of course it would "work", but it's not what I want because "?action=" is a pain to type.
....but it's far less ambiguous, and it's clear that it's an action performed on article "Pagename", rather than a page of its own.
But think about why that is the case. It's only because you know that the question mark ("?") cannot occur in the page title because it would be encoded as "%3F". Therefore, my proposal is essentially equivalent to yours except that the delimiting character is a single "/" and not a "?" and the proposed encoding is "//" and not "%2F". We can change any of these two parameters any way we like, just as long as we have a clear delimiter.
Slashes are more common in page titles only if the wiki makes extensive use of subpages. In actual articles (zero-namespace pages), the question mark is way more common (e.g. [[Is Everybody Listening?]]). I think the semicolon (";") is rarer than both of these, so even if you think it's unusual, it may be a better choice than both "/" and "?".
Greetings, Timwi
Timwi wrote:
Here is my personal opinion. I don't expect many people to agree, but I would like it this way most.
I want the "edit" to be at the *end* of the URL, not somewhere in the middle. That makes it easiest to add to (or remove from) an existing URL. (I also want the "wiki" to go away. :) )
One reason this is unlikely is that to keep web crawlers out of our interactive & dynamic areas we need to have distinct URL prefixes for robots.txt.
Right now we have a very basic split, with crawlable article pages in /wiki/ and everything else going via /w/index.php. We disallow the /w/ path prefix in robots.txt, so crawlers know not to bother poking in there.
While we can and do put prohibitions on indexing and further crawling in a <meta> tag, we also want to prevent crawlers from hitting expensive things like diffs in the first place.
-- brion vibber (brion @ pobox.com)
Brion Vibber wrote:
Timwi wrote:
I want the "edit" to be at the *end* of the URL, not somewhere in the middle. That makes it easiest to add to (or remove from) an existing URL. (I also want the "wiki" to go away. :) )
One reason this is unlikely is that to keep web crawlers out of our interactive & dynamic areas we need to have distinct URL prefixes for robots.txt.
I understand. I didn't think of this. I guess that means we have to have something between the domain name and the article title, and it can't just be a double-slash, so it has to be at least /w/ or something.
So how about this:
- http://en.wikipedia.org/w/Article_title Use /w/ for pages that can be crawled.
- http://en.wikipedia.org/z/Article_title;history Use /z/ for pages that must not be crawled.
- http://en.wikipedia.org/w/Article_title;history would redirect to http://en.wikipedia.org/z/Article_title;history
- http://en.wikipedia.org/z/Article_title would redirect to http://en.wikipedia.org/w/Article_title
What do you think? :-)
Timwi
wikitech-l@lists.wikimedia.org