On Apr 3, 2004, at 17:45, Gabriel Wicke wrote:
- invalid anchor names- attention! this might break links! Replacing
[^a-z0-9] -> '_' now
That's going to be pretty bad for languages that don't write in Latin script, isn't it?
-- brion vibber (brion @ pobox.com)
On Sat, 03 Apr 2004 18:43:38 -0800, Brion Vibber wrote:
On Apr 3, 2004, at 17:45, Gabriel Wicke wrote:
- invalid anchor names- attention! this might break links! Replacing
[^a-z0-9] -> '_' now
That's going to be pretty bad for languages that don't write in Latin script, isn't it?
Yes, i know. Looking up the spec on anchor names, will change it to a positive range. The trouble is that % isn't allowed as well, so urlencode doesn't work.
Gabriel Wicke wrote:
On Sat, 03 Apr 2004 18:43:38 -0800, Brion Vibber wrote:
On Apr 3, 2004, at 17:45, Gabriel Wicke wrote:
- invalid anchor names- attention! this might break links! Replacing
[^a-z0-9] -> '_' now
That's going to be pretty bad for languages that don't write in Latin script, isn't it?
Yes, i know. Looking up the spec on anchor names, will change it to a positive range. The trouble is that % isn't allowed as well, so urlencode doesn't work.
You could just simply hexdump it, without the '%'.
On Sun, 04 Apr 2004 14:39:16 +0100, Timwi wrote:
You could just simply hexdump it, without the '%'.
No, there are a few characters not allowed in anchor names, current replacement is now this: $canonized_headline = preg_replace("/[ &\/<>\(\)\[\]=,+]+/", '_', html_entity_decode( $tocline));
Validates ok now (at least in utf-8), but i might still have missed some invalid characters.
Gabriel Wicke wrote:
On Sun, 04 Apr 2004 14:39:16 +0100, Timwi wrote:
You could just simply hexdump it, without the '%'.
No, there are a few characters not allowed in anchor names
0-9 and a-f are allowed in anchor names. That's all you need for a hexdump.
wikitech-l@lists.wikimedia.org