On 4/25/07, Neil Harris <usenet(a)tonal.clara.co.uk> wrote:
To reverse the process, first percent-decode the URL
as needed, then
decode the resulting UTF-8 byte string into Unicode.
For example,
Fabry-P%C3%A9rot_interferometer
decodes to
Fabry-Pérot interferometer
...since %C3%A9 decodes to the two bytes 0xC3 0xA9, which is the UTF-8
encoding of Unicode code point U+00E9, which encodes the character "é".
That step is unnecessary if you're using a language like PHP1-5 that's
encoding-agnostic. It will decode to bytes that can be directly
output to a UTF-8-encoded page or stream, where they'll display
correctly. The conversion step is only possibly useful if you use a
language that distinguishes between Unicode and binary strings, and
it's not necessary there. The only thing is to be sure that whatever
you're passing it to or processing it with will interpret it as UTF-8,
if that distinction is relevant (which it probably is if the display
name is what's desired).
Basically, yes, it's standard urldecode() followed by replacement of
underscores with spaces.