On Mon, Dec 29, 2008 at 1:11 PM, Brion Vibber brion@wikimedia.org wrote:
There's no real need to encode these IMHO; in nearly all scenarios it would be more readable to strip them, just like we strip markup. Lossiness isn't a problem as long as the result is useful and legible. (Note we already have to handle uniqueness by appending a number for duplicate section header names, so stripping characters from the originals doesn't create a new problem there.)
Yeah, I was thinking about it and reached the same conclusion. Just replace any run of disallowed characters with underscores. If it starts with a character that's only valid in the middle, we might prefix an underscore instead of an x, while we're at it.