On Fri, 23 May 2003, Lee Daniel Crocker wrote:
- Cannot allow: # (sharp), | (pipe), " (quote), [] (brackets), {} (braces), <> (greater,less), + (plus), \ (backslash) because allowing them would interfere with link syntax and make the software more tricky to write. I can live without these, though I think + might be handy in some places (like C++), and might be worth the effort to allow.
Plus + and quote " are frequently asked for. These would not interfere with wiki syntax at all, though both would require escaping in URLs (as does the ampersand & when used in the query string and the percent % and question mark ? always, all of which we presently allow).
- Should allow anything Unicode calls a letter, numeral, syllable, or ideograph.
Okay...
- Should not allow Unicode diacriticals, combining forms, display forms (ligatures), controls, and other specials.
Waitaminute... that would seem to exclude the use of accented characters that do not have a precombined form. This could be seriously detrimental to some languages.
(In any case, we ought to do a little fancier work with UTF-8 to make sure that canonical forms are used to prevent false non-matches. I don't know if there's a library we can link into PHP to do this or if we'd have to write something.)
-- brion vibber (brion @ pobox.com)