On Mon, Dec 29, 2008 at 1:11 PM, Brion Vibber <brion(a)wikimedia.org> wrote:
There's no real need to encode these IMHO; in
nearly all scenarios it
would be more readable to strip them, just like we strip markup.
Lossiness isn't a problem as long as the result is useful and legible.
(Note we already have to handle uniqueness by appending a number for
duplicate section header names, so stripping characters from the
originals doesn't create a new problem there.)
Yeah, I was thinking about it and reached the same conclusion. Just
replace any run of disallowed characters with underscores. If it
starts with a character that's only valid in the middle, we might
prefix an underscore instead of an x, while we're at it.