[Wikipedia-l] [Fwd: [ wikipedia-Bugs-584804 ] URL followed by punctuation->broken link]

Brion VIBBER brion at pobox.com
Mon Jul 22 18:45:28 UTC 2002


The old software knew that 99.9% of the time humans don't _really_ mean 
it when they put a comma, period, or other such item of punctuation 
immediately after a URL, but that these are rather intended as, well, 
punctuation.

The new phase III software trusts us more; URLs that are followed 
immediately by punctuation (period, comma, paren, semicolon, etc) now 
include this punctuation in the hyperlink, which leads to a lot of 
broken external links where URLs are put casually into text, 
particularly on talk pages.

Bug or feature? You decide!


-------- Original Message --------
Subject: [ wikipedia-Bugs-584804 ] URL followed by punctuation->broken link
Date: Mon, 22 Jul 2002 10:34:59 -0700
From: noreply at sourceforge.net
To: noreply at sourceforge.net

Bugs item #584804, was opened at 2002-07-22 02:04
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=411192&aid=584804&group_id=34373

Category: Page rendering
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Brion Vibber (vibber)
Assigned to: Lee Daniel Crocker (lcrocker)
Summary: URL followed by punctuation->broken link

Initial Comment:
If a URL is put directly into article text, and is
followed immediately by a punctuation character, that
character is in many common cases misparsed as part of
the URL. This usually results in a 404 or other
page-not-found error when a user clicks on the link.

Example:
http://www.wikipedia.com/wiki/SandBox
http://www.wikipedia.com/wiki/SandBox"
both render and parse correctly, and the link is
clickable. The quote mark is not parsed as part of the
link.

But:
http://www.wikipedia.com/wiki/SandBox.
http://www.wikipedia.com/wiki/SandBox,
http://www.wikipedia.com/wiki/SandBox:
http://www.wikipedia.com/wiki/SandBox;
http://www.wikipedia.com/wiki/SandBox!
http://www.wikipedia.com/wiki/SandBox?
http://www.wikipedia.com/wiki/SandBox)
all include an extra character on the end, resulting in
failure when the link is followed. This is contrary to
the functionality of software phase I and II, and will
break a lot of links to external sites, particularly in
talk pages but also in some articles.  (Note that the
question-mark link here in fact works by happy
coincidence, but is still incorrectly included in the
URL where it really oughtn't to be.)

----------------------------------------------------------------------

Comment By: Lee Daniel Crocker (lcrocker)
Date: 2002-07-22 10:16

Message:
Logged In: YES
user_id=3076

This was brought up before, and I rejected it, because things
like commas and periods are perfectly legal URL characters;
it would be wrong to not to parse them as such.  But I'm
willing to be swayed by consensus here--if the community
really thinks we /should/ do it "wrong" and leave out
punctuation in certain contexts, I'll do that. But it will have to
be defined precisely and agreed upon.


----------------------------------------------------------------------

 >Comment By: Brion Vibber (vibber)
Date: 2002-07-22 10:34

Message:
Logged In: YES
user_id=446709

Yes, those are all valid characters in URLs. However,
they're all _very_ rare at the _end_ of URLs, yet very
common as punctuation in English text. Some people will
deliberately leave a space after a URL before using
punctuation on the assumption that some stupid piece of
software is going to try to make a link that includes the
punctuation, but this is A) ugly and B) not done often
enough that we ought to rely on it.

People _do_ put punctuation immediately at the end of links,
and they seem to expect that the software will _not_ give
them a 404 error because of it... especially the software
has been handling the case correctly for as long as they've
used it.

Not taking this fact into account violates the principle of
least surprise and breaks far more links than it corrects
(if any). On the rare occasion that a URL actually ends in
one of the above characters, we have the [URL URL] syntax.

I'm forwarding this bug report to wikipedia-l for a group vote.

----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=411192&aid=584804&group_id=34373




More information about the Wikipedia-l mailing list