Yea I've seen that recommendation to use redirects. Its not a good solution. For example we have two pages: mysite.com/wiki/Don't Speak (song) mysite.com/wiki/Don'Amos Both of these URL's would break at: mysite.com/wiki/Don and a redirect would not work. I could create a "help" page for the "Don" entry but I've lost the visitor's interest already. Onmy site I have many pages breaking "early" at the same location in this way, so this is an actual problem for me.
Sometimes we would need multiple redirects for a page, e.g. the same title: mysite.com/wiki/Don't Speak (song) There's 3 possible places where the link could break:
mysite.com/wiki/Don
mysite.com/wiki/Don't Speak ( mysite.com/wiki/Don't Speak (song
As again, the first break might not be enough to specify where the user wanted to go. So now we need to create multiple redirects for any page that has this problem and we're still not sure the visitor will arrive where they wanted to. So its not a feasible solution to create redirects. Plus suppose I do have a redirect for a certain entry. If a user posts a link on a forum the forum software renders it as: mysite.com/wiki/Don't Speak (song This will confuse the user and make them wonder if the link will work or not.
The only real solution is to not allow these characters in a URL at all: - commas, apostrophes, brackets, colons, semicolons and so on. That is the real problem that needs to be dealt with somehow. Since there are 1000's of URL rendering software routines all over the web, we can be sure that if a URL just has letters, numbers, underscores and dashes, it will definitely work. If someone was writing a URL rendering routine and they saw this: "Hey, did you see the site I sent you (http://mediawiki.org/wiki/extensions(safe)), was actually not working?" The software guy will break the URL before the ending bracket, while Mediawiki wants the bracket to be part of the URL. In this case: "Hey, I went to http://www.mediawiki.org/wiki/blah, had coffee and then went to bed". Here again the software guy will break it on the comma while MW might want us to include the comma in the URL.
So in my opinion, MW should take care of this one way or the other and not allow people to use these forbidden characters in the URL, while allowing them to be used in the page heading. {{DISPLAYTITLE}} works but I have to take some additional steps. It should work like this: - For a Page with the name (or URL) "Foo", if we use {{DISPLAYTITLE|Bar}} on that page, then: --- if we make interwiki links to [[Foo]] or [[Bar]], it should automatically always link to wiki/Foo, but display it as "Bar". * --- Any automatically generated page logs and contributions links and so on, should always link to Foo, but display it as Bar. --- If we want to have a different text display, we can use [[Foo|Blah blah]] as usual. The URL (Foo) is where we're restricted with characters, and Bar is where we have complete freedom to display anything in the page heading. *: Once again, our first priority is to prevent broken links and although this creates an inconsistency as compared to other pages on the site, this is the only option we have. Also, having a link that doesnt match up with the page heading, is very commonplace on non-wiki sites, so its not a problem.
I'm OK with the solution I posted. I would use DISPLAYTITLE, and use [[Foo|Bar]] for interwiki links. I would format Foo to be as close to Bar as possible, but not use any problematic characters. So the page name would be: Dont Speak - SongĀ (actual name of the page) And the page heading, displayed with {{DISPLAYTITLE| }}, would be as I wanted it to be: [[Dont Speak - songĀ | Don't Speak (song)]] I have no option but to not use the apostrophe. Lets see how Yahoo mail and the list software format this URL: http://en.wikipedia.org/wiki/Don%27t_Speak%C2%A0%C2%A0%C2%A0%C2%A0%C2%A0%C2%... - [1] I know it wont work if its posted on topix.com and many other sites. Another problem is that if we have these characters in the URL and they're encoded in % signs, e.g.: http://en.wikipedia.org/wiki/Don%27t_Speak That doesn't look good. The % sign encoding does make the URL work, but its not guaranteed that the user will get it like I get it. Many times I've seen people copy pasting links to my site but they didnt have the % encoding and they broke. I dont know how that happened, but I wont blame the user. They simply copy pasted in a different environment. A URL should work if copy pasted and if we're tolerating a failure rate here, it should be extremely low (say, 1%). For example the word "Apple" can be copy pasted with the same results in every environment, but the URL [1] has a high failure rate, which is why I've seen many broken links. So to me, the % encoding is not a solution that prevents failure and therefore should not be used.
First, we are a website to the world, and then we are a wiki to the people who work on the site. So my first priority is to have links that don't break. If that means having links which will work, but dont look perfect (e.g. using "Dont", which is not grammatically correct), I would rather do that than have a grammatically correct link that will break when posted on some websites.
In any case I think this is something that should have been dealt with so these problematic characters would never be seen in the URL of wikipedia or any other mediawiki site. Its not a big problem for me to use the DISPLAYTITLE feature and do the work arounds and tolerate some non-ideal page logs, which will show the page URL, instead of the page title (I will try to have the smallest possible difference). I'm glad that solution is an option and the feature is built in. I do think the the performance of URL's on a website is a serious issue, and they should always work and if I have to do some extra work to make them work, thats fine with me.
I wish I didn't have to use these characters in the page heading but many times I have to and that freedom should be there, as it exists on a non-wiki website and at the same time, I should not have a URL that might break and its OK to have the page heading and URL different from each other. I can imagine millions of Mediawiki links breaking every day due to the presence of these characters. If the MW software people decided to deal with this, they would have to figure out a way how to keep the page heading seperate from the URL and still have everything work fine. Now my site wouldn't exist without the MW software so I'm very thankful to all those who have worked on it. But anyway, yea - these are some of my thoughts on URL breaks and page headings.
Eric
________________________________ From: Kilian drehbuehne@texttheater.net To: mediawiki-l@lists.wikimedia.org Sent: Monday, January 9, 2012 12:47 PM Subject: Re: [Mediawiki-l] Links with (, ), :, ' - break all the time
On 01/09/2012 03:02 AM, Benjamin Lees wrote:
But why don't you just use redirects?
Redirects wouldn't solve the problem. Users would be redirected to URLs with spaces/punctuation, copy them from their browser's location bar and still post them elsewhere.
_______________________________________________ MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-l