Could someone knowledgeable about URL encoding take a look at this pull request? Thanks! https://github.com/wikimedia/DeadlinkChecker/pull/26/files
HTML4 reccomended people use ; instead of & to separate url parameters, to avoid conflicts with entity references. However, afaik most web servers don't support this (I think its mostly some java things that do). See https://www.w3.org/TR/1999/REC-html401-19991224/appendix/notes.html#h-B.2.2 Modern HTML5 abandoned this reccomendation afaik.
-- Brian
On Fri, Oct 20, 2017 at 8:07 PM, Ryan Kaldari rkaldari@wikimedia.org wrote:
Could someone knowledgeable about URL encoding take a look at this pull request? Thanks! https://github.com/wikimedia/DeadlinkChecker/pull/26/files _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
The main issue I'm not sure about here is the use of ; as a query string initiator (rather than a query string parameter separator). This use of semicolons is completely non-standard, AFAIK, but it looks like there are some web servers that are actually using it this way. Comments at the pull request itself would be most useful (rather than by email).
On Fri, Oct 20, 2017 at 1:17 PM, bawolff bawolff+wn@gmail.com wrote:
HTML4 reccomended people use ; instead of & to separate url parameters, to avoid conflicts with entity references. However, afaik most web servers don't support this (I think its mostly some java things that do). See https://www.w3.org/TR/1999/REC-html401-19991224/appendix/ notes.html#h-B.2.2 Modern HTML5 abandoned this reccomendation afaik.
-- Brian
On Fri, Oct 20, 2017 at 8:07 PM, Ryan Kaldari rkaldari@wikimedia.org wrote:
Could someone knowledgeable about URL encoding take a look at this pull request? Thanks! https://github.com/wikimedia/DeadlinkChecker/pull/26/files _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Hi!
The main issue I'm not sure about here is the use of ; as a query string initiator (rather than a query string parameter separator). This use of semicolons is completely non-standard, AFAIK, but it looks like there are some web servers that are actually using it this way. Comments at the pull request itself would be most useful (rather than by email).
Generally, the servers are free to parse the local part of the URL as they like. After all, many servers using REST treat something like /user/2/name as essentially query string, even though / is a path separator. Nothing prevents other servers from adopting the scheme of user;2;name instead or any other way of parsing the local path.
https://tools.ietf.org/html/rfc3986#section-3.4 clearly states that query is delimited by "?". Which means the URLs with ";" are path components, as per RFC:
Aside from dot-segments in hierarchical paths, a path segment is considered opaque by the generic syntax. URI producing applications often use the reserved characters allowed in a segment to delimit scheme-specific or dereference-handler-specific subcomponents. For example, the semicolon (";") and equals ("=") reserved characters are often used to delimit parameters and parameter values applicable to that segment. The comma (",") reserved character is often used for similar purposes. For example, one URI producer might use a segment such as "name;v=1.1" to indicate a reference to version 1.1 of "name", whereas another might use a segment such as "name,1.1" to indicate the same.
So, the specific application can treat path components the same way as query components, but they are still path components. My reading of the RFC also seems to be that ";" is a reserved character, and as such should not be URL-encoded. Indeed, path BNF includes sub-delims without encoding, which includes ";". However, I am not sure I understand other part of the patch where it plays with query string.
It would be easier to comment if either the pull request or this thread explained what the patch is trying to do (XY problem etc) but in general:
- the URI spec defines ? as the separator character between the path and the query part, but it doesn't say much about what the path and query part *are* (except that the path specifies a resource in a hierarchical manner, and the query specifies it in a non-hierarchical manner). So a web application using something other for separating the query part will violate the spirit of the spec but probably won't experience any problems. Haven't heard of anything doing that before; this display.w3p thing seems like some super-obscure web framework only used by Australian and Singaporean government web pages.
- the URI spec does not say anything about the contents of the query part. It specifies ;/!?:@&=+*$,'[]() as the set of reserved characters, so those are the only sane choices for separating sub-arguments (as everything else might get percent-encoded by the browser, but reserved characters are guaranteed to be preserved). The choice of & and = as argument separator and key-value separator are a common convention, and used by some standards such as x-www-form-urlencoded, but a web application is free to choose something else in theory. In practice I think only very old and fringe ones do.
- the URI spec allows parameters in path segments (sometimes called matrix parameters). https://www.w3.org/DesignIssues/MatrixURIs.html has some examples. The older URI RFC, 2396, prescribed the semicolon as the parameter separator; RFC 3986 allows any reserved character; but in practice usually it's a semicolon. These are used a fair bit in RESTish URLs; Angular uses them, for example. When only the last path segment has parameters, the URL has the same structure as the one in the pull request.
On Fri, Oct 20, 2017 at 1:07 PM, Ryan Kaldari rkaldari@wikimedia.org wrote:
Could someone knowledgeable about URL encoding take a look at this pull request? Thanks! https://github.com/wikimedia/DeadlinkChecker/pull/26/files _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org