On Dec 6, 2004, at 11:11 PM, Nick Triantos wrote:
I've seen a few threads go by about not being able
to use "funny"
characters in page names... Everything from non-US characters, to the
plus sign ( + ) and apostrophe ( ' ).
Is there a reason that the page titles aren't just stored in the
database in urlencode( )'d page titles? When a search, etc. is
performed, we could just convert the string to its urlencode( )
equivalent, and do the same for [[ ]] type links.
The storage of titles in the database is *in no way at all* related to
limitations of what characters we allow in titles. Storing them in the
database urlencode()d would make absolutely no difference except to
complicate the code and bloat the database.
What characters we allow in titles is *only* related to the wiki link
syntax and HTML and URL encoding/decoding issues.
It's extremely helpful for titles to be idempotent for URL decoding;
data may be decoded multiple times for instance due to mod_rewrite
hell, and for historical reason we in some places allow URL-encoded
text to be used in wiki links themselves. Interwiki links at least let
%xx hex does pass through unmolested.
(It's pretty annoying to me that you can't
name a page "C++") :-)
The + issue has to do with multiple encoding/decoding issues and
backwards-compatible links (in URL encoding, + represents a space).
-- brion vibber (brion @
pobox.com)