lcrocker(a)nupedia.com wrote:
As I've
mentioned before, I'm pretty sure it's the encoding hack
I set up to keep ampersands in titles _in_ the titles instead of
as raw ampersands that indicate the beginning of the next variable
in the query string:
RewriteEngine On
RewriteMap urlencode prg:/usr/local/bin/urlencode
RewriteRule ^/wiki/(.*)$ /w/wiki.phtml?title=${urlencode:$1} [L]
If the hackish little external program should die or get out of
sync, we end up with the wrong URLs. But this ugliness *shouldn't*
be needed. We *should* be able to use the internal function that
Apache provides for this...
You are mistaken that Apache is doing the wrong thing: ampersands
are /not/ supposed to be urlencoded--they are valid and meaningful
characters needed for URLs.
It's not the wrong thing in _all_ cases, but it's definitely the wrong
thing for the case of "take an arbitrary string and put it as a value in
a key=value pair in a URL-encoded query string", which is the main
reason I would use such a function in URL-rewriting.
But ampersands do need to be messed with
for Wikipedia-specific reasons: since article titles must appear as
values in the query string (which is separated by ampersands), they
must be escaped somehow for that function. Also, the
non-escaped ampersands in the URL must be HTML-escaped when they
appear as attribute values, such as HREFs. These are both entirely
separate issues, and the code formerly dealt with them correctly,
although in a way that you didn't like. We may have to compromise;
accept the double-encoding for ampersands that you removed for other
characters. Either that, or come up with some other escaping
mechanism for ampersands in titles.
Aside from my general distaste of the double-encoding, it doesn't handle
the case of manual input: someone who types
http://www.wikipedia.com/wiki/AT&T into their URL bar shouldn't end up
at [[AT]].
See attached patch for the Apache source which adds a rewrite map
function which encodes ampersands only. It works nicely on my test
server, but I don't want to mess with installing it on the main server;
I'm not sure exactly how the compile configuration was set up, and I've
done enough damage lately. :)
Once installed, the rewrite map can look like this:
RewriteEngine On
RewriteMap urlencode int:ampencode
RewriteRule ^/wiki/(.*)$ /w/wiki.phtml?title=${urlencode:$1} [L]
...
If it looks reasonable, please go ahead and set it up.
-- brion vibber (brion @
pobox.com)
--- orig/apache_1.3.26/src/modules/standard/mod_rewrite.h Wed Mar 13 13:05:34 2002
+++ apache_1.3.26/src/modules/standard/mod_rewrite.h Tue Oct 15 14:07:21 2002
@@ -447,6 +447,7 @@
static char *rewrite_mapfunc_toupper(request_rec *r, char *key);
static char *rewrite_mapfunc_tolower(request_rec *r, char *key);
static char *rewrite_mapfunc_escape(request_rec *r, char *key);
+static char *rewrite_mapfunc_ampescape(request_rec *r, char *key);
static char *rewrite_mapfunc_unescape(request_rec *r, char *key);
static char *select_random_value_part(request_rec *r, char *value);
static void rewrite_rand_init(void);
--- orig/apache_1.3.26/src/modules/standard/mod_rewrite.c Wed May 29 10:39:23 2002
+++ apache_1.3.26/src/modules/standard/mod_rewrite.c Tue Oct 15 14:07:49 2002
@@ -502,6 +502,9 @@
else if (strcmp(a2+4, "unescape") == 0) {
new->func = rewrite_mapfunc_unescape;
}
+ else if (strcmp(a2+4, "ampescape") == 0) {
+ new->func = rewrite_mapfunc_ampescape;
+ }
else if (sconf->state == ENGINE_ENABLED) {
return ap_pstrcat(cmd->pool, "RewriteMap: internal map not
found:",
a2+4, NULL);
@@ -2982,6 +2985,30 @@
value = ap_escape_uri(r->pool, key);
return value;
+}
+
+static char *rewrite_mapfunc_ampescape(request_rec *r, char *key)
+{
+ /* We only need to escape the ampersand */
+ char *copy = ap_palloc(r->pool, 3 * strlen(key) + 3);
+ const unsigned char *s = (const unsigned char *)key;
+ unsigned char *d = (unsigned char *)copy;
+ unsigned c;
+
+ while ((c = *s)) {
+ if (c == '&') {
+ *d++ = '%';
+ *d++ = '2';
+ *d++ = '6';
+ }
+ else {
+ *d++ = c;
+ }
+ ++s;
+ }
+ *d = '\0';
+
+ return copy;
}
static char *rewrite_mapfunc_unescape(request_rec *r, char *key)