Jonathan Walther wrote:
Your examples are legitimate. How would you feel if there was a user option to edit in "broken UTF-8 mode"? Then when you edited a page, you could insert some markup to put in non-ASCII characters. I don't know what the best way to do this would be; I am guessing something like \xAB\xCD where \x means "an 8 bit value in hexadecimal representation follows". If you have any other ideas, let me know.
The specila markup could simply be the &#...; codes themselves. Internally, we would store UTF-8, not an ampersand, semicolon, etc. But these would be presented in the edit box as HTML entities (names if they exist, numbers otherwise), when that option is checked. (But you could *input* either HTML entities or direct UTF-8 regardless.)
Then we could even let [[fr:]] (say) choose to make that option the default, while letting [[pl:]] (say) eschew it.
-- Toby