I noticed while validating the HTML output of an extension that some urls are being generated with a plain & instead of encoded as & and the W3C validator complains about this as an error. This patch fixes the line of code that was the source of the uncoded ampersand (and another line I noticed) if anyone with CVS access chooses to apply it.
On Tue, 29 Nov 2005 00:09:04 +1100, Netocrat wrote:
I noticed while validating the HTML output of an extension that some urls are being generated with a plain & instead of encoded as & and the W3C validator complains about this as an error. This patch fixes the line of code that was the source of the uncoded ampersand (and another line I noticed) if anyone with CVS access chooses to apply it.
Looks like the attachment's been stripped on gmane. Here it is inline.
Index: Title.php =================================================================== RCS file: /cvsroot/wikipedia/phase3/includes/Title.php,v retrieving revision 1.243 diff -u -r1.243 Title.php --- Title.php 13 Nov 2005 04:09:06 -0000 1.243 +++ Title.php 28 Nov 2005 11:30:53 -0000 @@ -678,7 +678,7 @@ if( false === strpos( $url, '?' ) ) { $url .= '?'; } else { - $url .= '&'; + $url .= '&'; } $url .= $query; } @@ -725,7 +725,7 @@ if ( $query == '-' ) { $query = ''; } - $url = "{$wgScript}?title={$dbkey}&{$query}"; + $url = "{$wgScript}?title={$dbkey}&{$query}"; } }
Netocrat wrote:
On Tue, 29 Nov 2005 00:09:04 +1100, Netocrat wrote:
I noticed while validating the HTML output of an extension that some urls are being generated with a plain & instead of encoded as & and the W3C validator complains about this as an error. This patch fixes the line of code that was the source of the uncoded ampersand (and another line I noticed) if anyone with CVS access chooses to apply it.
Looks like the attachment's been stripped on gmane. Here it is inline.
[snip]
$url = "{$wgScript}?title={$dbkey}&{$query}";
$url = "{$wgScript}?title={$dbkey}&{$query}";
This patch is incorrect, and will cause broken URLs to be output throughout the wiki.
Instead, you should locate the individual *output* of the bad URL that you found and patch *that* to properly HTML-encode its output.
-- brion vibber (brion @ pobox.com)
On Mon, 28 Nov 2005 12:40:56 -0800, Brion Vibber wrote:
Netocrat wrote:
[incorrectly patching Title.php/getLocalURL() and getFullURL() to encode ampersands]
This patch is incorrect, and will cause broken URLs to be output throughout the wiki.
Too hasty - I didn't notice that there were already escaped versions of these functions and that unescaped versions were necessary.
Instead, you should locate the individual *output* of the bad URL that you found and patch *that* to properly HTML-encode its output.
I based part of the extension on code from BoardVote.php, which doesn't use the escaped url function to generate the action of a form. So the patch is not very significant anyhow but this is what it should have been:
Index: BoardVote.php =================================================================== RCS file: /cvsroot/wikipedia/extensions/BoardVote/BoardVote.php,v retrieving revision 1.4 diff -u -r1.4 BoardVote.php --- BoardVote.php 13 Sep 2005 14:12:09 -0000 1.4 +++ BoardVote.php 29 Nov 2005 05:05:56 -0000 @@ -155,7 +155,7 @@ global $wgBoardCandidates, $wgOut; $thisTitle = Title::makeTitle( NS_SPECIAL, "Boardvote" ); - $action = $thisTitle->getLocalURL( "action=vote" ); + $action = $thisTitle->escapeLocalURL( "action=vote" ); if ( $this->mHasVoted ) { $intro = wfMsg( "boardvote_intro_change" ); } else {
Netocrat wrote:
On Mon, 28 Nov 2005 12:40:56 -0800, Brion Vibber wrote:
Netocrat wrote:
[incorrectly patching Title.php/getLocalURL() and getFullURL() to encode ampersands]
This patch is incorrect, and will cause broken URLs to be output throughout the wiki.
Too hasty - I didn't notice that there were already escaped versions of these functions and that unescaped versions were necessary.
As a general rule, escaping should be done as close as possible to the actual use. It's part of the file format or communications protocol in your output, not a part of your data, so you want to keep them separate to avoid unnecessarily hobbling code that should be more flexible.
URLs may go to different outputs, not always HTML; for instance HTTP headers, plaintext output, generated JavaScript code, etc, will all have different requirements from XML/HTML text.
Early escaping is also dangerous, since additional processing or just forgetting where you got some value can end up producing either corrupt data (such as double-escaping) or security vulnerabilities from missing escaping (SQL injection, HTML/JavaScript injection, etc).
Instead, you should locate the individual *output* of the bad URL that you found and patch *that* to properly HTML-encode its output.
I based part of the extension on code from BoardVote.php, which doesn't use the escaped url function to generate the action of a form. So the patch is not very significant anyhow but this is what it should have been:
[patch snipped]
Thanks, I've applied it to CVS.
-- brion vibber (brion @ pobox.com)
wikitech-l@lists.wikimedia.org