The old software knew that 99.9% of the time humans don't _really_ mean it when they put a comma, period, or other such item of punctuation immediately after a URL, but that these are rather intended as, well, punctuation.
The new phase III software trusts us more; URLs that are followed immediately by punctuation (period, comma, paren, semicolon, etc) now include this punctuation in the hyperlink, which leads to a lot of broken external links where URLs are put casually into text, particularly on talk pages.
Bug or feature? You decide!
-------- Original Message -------- Subject: [ wikipedia-Bugs-584804 ] URL followed by punctuation->broken link Date: Mon, 22 Jul 2002 10:34:59 -0700 From: noreply@sourceforge.net To: noreply@sourceforge.net
Bugs item #584804, was opened at 2002-07-22 02:04 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=411192&aid=584804&...
Category: Page rendering Group: None Status: Open Resolution: None Priority: 5 Submitted By: Brion Vibber (vibber) Assigned to: Lee Daniel Crocker (lcrocker) Summary: URL followed by punctuation->broken link
Initial Comment: If a URL is put directly into article text, and is followed immediately by a punctuation character, that character is in many common cases misparsed as part of the URL. This usually results in a 404 or other page-not-found error when a user clicks on the link.
Example: http://www.wikipedia.com/wiki/SandBox http://www.wikipedia.com/wiki/SandBox" both render and parse correctly, and the link is clickable. The quote mark is not parsed as part of the link.
But: http://www.wikipedia.com/wiki/SandBox. http://www.wikipedia.com/wiki/SandBox, http://www.wikipedia.com/wiki/SandBox: http://www.wikipedia.com/wiki/SandBox; http://www.wikipedia.com/wiki/SandBox! http://www.wikipedia.com/wiki/SandBox? http://www.wikipedia.com/wiki/SandBox) all include an extra character on the end, resulting in failure when the link is followed. This is contrary to the functionality of software phase I and II, and will break a lot of links to external sites, particularly in talk pages but also in some articles. (Note that the question-mark link here in fact works by happy coincidence, but is still incorrectly included in the URL where it really oughtn't to be.)
----------------------------------------------------------------------
Comment By: Lee Daniel Crocker (lcrocker) Date: 2002-07-22 10:16
Message: Logged In: YES user_id=3076
This was brought up before, and I rejected it, because things like commas and periods are perfectly legal URL characters; it would be wrong to not to parse them as such. But I'm willing to be swayed by consensus here--if the community really thinks we /should/ do it "wrong" and leave out punctuation in certain contexts, I'll do that. But it will have to be defined precisely and agreed upon.
----------------------------------------------------------------------
Comment By: Brion Vibber (vibber)
Date: 2002-07-22 10:34
Message: Logged In: YES user_id=446709
Yes, those are all valid characters in URLs. However, they're all _very_ rare at the _end_ of URLs, yet very common as punctuation in English text. Some people will deliberately leave a space after a URL before using punctuation on the assumption that some stupid piece of software is going to try to make a link that includes the punctuation, but this is A) ugly and B) not done often enough that we ought to rely on it.
People _do_ put punctuation immediately at the end of links, and they seem to expect that the software will _not_ give them a 404 error because of it... especially the software has been handling the case correctly for as long as they've used it.
Not taking this fact into account violates the principle of least surprise and breaks far more links than it corrects (if any). On the rare occasion that a URL actually ends in one of the above characters, we have the [URL URL] syntax.
I'm forwarding this bug report to wikipedia-l for a group vote.
---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=411192&aid=584804&...