I wrote:
The special markup could simply be the &#...; codes themselves. Internally, we would store UTF-8, not an ampersand, semicolon, etc. But these would be presented in the edit box as HTML entities (names if they exist, numbers otherwise), when that option is checked. (But you could *input* either HTML entities or direct UTF-8 regardless.)
Actually, we need 3 options if everybody is to be satisfied:
Option Edit box presents as Edit box accepts input as ----------------------------------------------------------------------------- UTF-8 UTF-8 UTF-8 &name; &#decimal; &#Xhexadecimal; Latin-1 Latin-1 &name; &#decimal; Latin-1 &name; &#decimal; &#Xhexadecimal; ASCII ASCII &name; &#decimal; Latin-1 &name; &#decimal; &#Xhexadecimal;
When presenting the edit box (middle column), use the first version listed that applies to the character in question; when accepting input from the edit box (last column), accept anything that we get, with the default encoding listed.
Then we could even let [[fr:]] (say) choose to make that option the default, while letting [[pl:]] (say) eschew it.
Presumably, of the above, [[pl:]] would set the default to UTF-8, [[fr:]] would set the default to Latin-1 (for anthere's sake), and [[en:]] would set the default to ASCII (form mav's sake ^_^). But everything in the databases would be UTF-8 internally.
Of course, once the system is set up to run in general, then it'll be no great trick to let [[pl:]] set their default to Latin-2 -- since it's a default of how to present Unicode to editors, not a limitation on the available Unicode characters.
As for the forbidden numerical character entites from € to š, we can interpret them as if they came from Micro$oft (most likely) and convert them to whatever they should be (by table). (If any other forbidden numerical entities have common nonstandard uses, then we can adopt those as well as long as they translate to good Unicode.)
Gee, I guess that I'm getting involved in this after all. But it now seems like there might be a solution, not just an argument!
-- Toby