The old software knew that 99.9% of the time humans don't _really_ mean it when they put a comma, period, or other such item of punctuation immediately after a URL, but that these are rather intended as, well, punctuation.
The new phase III software trusts us more; URLs that are followed immediately by punctuation (period, comma, paren, semicolon, etc) now include this punctuation in the hyperlink, which leads to a lot of broken external links where URLs are put casually into text, particularly on talk pages.
Bug or feature? You decide!
-------- Original Message -------- Subject: [ wikipedia-Bugs-584804 ] URL followed by punctuation->broken link Date: Mon, 22 Jul 2002 10:34:59 -0700 From: noreply@sourceforge.net To: noreply@sourceforge.net
Bugs item #584804, was opened at 2002-07-22 02:04 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=411192&aid=584804&...
Category: Page rendering Group: None Status: Open Resolution: None Priority: 5 Submitted By: Brion Vibber (vibber) Assigned to: Lee Daniel Crocker (lcrocker) Summary: URL followed by punctuation->broken link
Initial Comment: If a URL is put directly into article text, and is followed immediately by a punctuation character, that character is in many common cases misparsed as part of the URL. This usually results in a 404 or other page-not-found error when a user clicks on the link.
Example: http://www.wikipedia.com/wiki/SandBox http://www.wikipedia.com/wiki/SandBox" both render and parse correctly, and the link is clickable. The quote mark is not parsed as part of the link.
But: http://www.wikipedia.com/wiki/SandBox. http://www.wikipedia.com/wiki/SandBox, http://www.wikipedia.com/wiki/SandBox: http://www.wikipedia.com/wiki/SandBox; http://www.wikipedia.com/wiki/SandBox! http://www.wikipedia.com/wiki/SandBox? http://www.wikipedia.com/wiki/SandBox) all include an extra character on the end, resulting in failure when the link is followed. This is contrary to the functionality of software phase I and II, and will break a lot of links to external sites, particularly in talk pages but also in some articles. (Note that the question-mark link here in fact works by happy coincidence, but is still incorrectly included in the URL where it really oughtn't to be.)
----------------------------------------------------------------------
Comment By: Lee Daniel Crocker (lcrocker) Date: 2002-07-22 10:16
Message: Logged In: YES user_id=3076
This was brought up before, and I rejected it, because things like commas and periods are perfectly legal URL characters; it would be wrong to not to parse them as such. But I'm willing to be swayed by consensus here--if the community really thinks we /should/ do it "wrong" and leave out punctuation in certain contexts, I'll do that. But it will have to be defined precisely and agreed upon.
----------------------------------------------------------------------
Comment By: Brion Vibber (vibber)
Date: 2002-07-22 10:34
Message: Logged In: YES user_id=446709
Yes, those are all valid characters in URLs. However, they're all _very_ rare at the _end_ of URLs, yet very common as punctuation in English text. Some people will deliberately leave a space after a URL before using punctuation on the assumption that some stupid piece of software is going to try to make a link that includes the punctuation, but this is A) ugly and B) not done often enough that we ought to rely on it.
People _do_ put punctuation immediately at the end of links, and they seem to expect that the software will _not_ give them a 404 error because of it... especially the software has been handling the case correctly for as long as they've used it.
Not taking this fact into account violates the principle of least surprise and breaks far more links than it corrects (if any). On the rare occasion that a URL actually ends in one of the above characters, we have the [URL URL] syntax.
I'm forwarding this bug report to wikipedia-l for a group vote.
---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=411192&aid=584804&...
On Mon, Jul 22, 2002 at 10:45:28AM -0800, Brion VIBBER wrote:
The old software knew that 99.9% of the time humans don't _really_ mean it when they put a comma, period, or other such item of punctuation immediately after a URL, but that these are rather intended as, well, punctuation.
The new phase III software trusts us more; URLs that are followed immediately by punctuation (period, comma, paren, semicolon, etc) now include this punctuation in the hyperlink, which leads to a lot of broken external links where URLs are put casually into text, particularly on talk pages.
Bug or feature? You decide!
I would say feature. The parsing rules should be as simple as possible and have as little exceptions as possible. This makes them easier to explain to users, keeps the software simple, and gives less headaches when in the future we want to adapt mark-up or export to other formats such as XML.
-- Jan Hidders
Brion VIBBER wrote:
The old software knew that 99.9% of the time humans don't _really_ mean it when they put a comma, period, or other such item of punctuation immediately after a URL, but that these are rather intended as, well, punctuation.
The new phase III software trusts us more; URLs that are followed immediately by punctuation (period, comma, paren, semicolon, etc) now include this punctuation in the hyperlink, which leads to a lot of broken external links where URLs are put casually into text, particularly on talk pages.
Bug or feature? You decide!
[...]
You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=411192&aid=584804&a...
So far the votes are tied 2-2:
Crop final punctuation from links: Brion, Tony Include final punctuation in links: Lee, Jan
-- brion vibber (brion @ pobox.com)
On 7/24/02 5:06 PM, "Brion VIBBER" brion@pobox.com wrote:
Brion VIBBER wrote:
The old software knew that 99.9% of the time humans don't _really_ mean it when they put a comma, period, or other such item of punctuation immediately after a URL, but that these are rather intended as, well, punctuation.
The new phase III software trusts us more; URLs that are followed immediately by punctuation (period, comma, paren, semicolon, etc) now include this punctuation in the hyperlink, which leads to a lot of broken external links where URLs are put casually into text, particularly on talk pages.
Bug or feature? You decide!
[...]
You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=411192&aid=584804&a... 4373
So far the votes are tied 2-2:
Crop final punctuation from links: Brion, Tony Include final punctuation in links: Lee, Jan
-- brion vibber (brion @ pobox.com)
Crop.
The Cunctator wrote thusly: <snip> || > Crop final punctuation from links: Brion, Tony | > Include final punctuation in links: Lee, Jan | > | > -- brion vibber (brion @ pobox.com) | > | Crop.
Count me in: Crop
regards, WojPob
<snip>
The Cunctator wrote thusly:
<snip> || > Crop final punctuation from links: Brion, Tony | > Include final punctuation in links: Lee, Jan | > | > -- brion vibber (brion @ pobox.com) | > | Crop.
I vote to crop except apostrophes.
Eclecticology
The Cunctator wrote:
On 7/24/02 5:06 PM, "Brion VIBBER" brion@pobox.com wrote:
Brion VIBBER wrote:
The old software knew that 99.9% of the time humans don't _really_ mean it when they put a comma, period, or other such item of punctuation immediately after a URL, but that these are rather intended as, well, punctuation.
The new phase III software trusts us more; URLs that are followed immediately by punctuation (period, comma, paren, semicolon, etc) now include this punctuation in the hyperlink, which leads to a lot of broken external links where URLs are put casually into text, particularly on talk pages.
Bug or feature? You decide!
[...]
You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=411192&aid=584804&a... 4373
So far the votes are tied 2-2:
Crop final punctuation from links: Brion, Tony Include final punctuation in links: Lee, Jan
-- brion vibber (brion @ pobox.com)
Crop.
[Wikipedia-l] To manage your subscription to this list, please go here: http://www.nupedia.com/mailman/listinfo/wikipedia-l
I vote for punctuation cropping, too.
Neil
The Cunctator wrote:
On 7/24/02 5:06 PM, "Brion VIBBER" brion@pobox.com wrote:
Brion VIBBER wrote:
The old software knew that 99.9% of the time humans don't _really_ mean it when they put a comma, period, or other such item of punctuation immediately after a URL, but that these are rather intended as, well, punctuation.
The new phase III software trusts us more; URLs that are followed immediately by punctuation (period, comma, paren, semicolon, etc) now include this punctuation in the hyperlink, which leads to a lot of broken external links where URLs are put casually into text, particularly on talk pages.
Bug or feature? You decide!
[...]
You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=411192&aid=584804&a... 4373
So far the votes are tied 2-2:
Crop final punctuation from links: Brion, Tony Include final punctuation in links: Lee, Jan
I'd vote for cropping too... urls never end with a punctuation mark afaik...
wikipedia-l@lists.wikimedia.org