Clutch wrote:
Toby Bartels wrote:
>You can (and we often do, on [[en:]]) using HTML
entities,
>such as Č (for "C(", "C" with a hacek, TeX's "\v
C").
That approach borks things up. Specifically, it screws
websearches.
How many people are going to enter, or know to enter, the HTML entity
when they type in a search term?
Google is quite capable of finding "Č" when a user enters
"C("
(well not literally "C(", but the actual Czech letter itself).
If Wikipedia's own search engine isn't, then we should fix that anyway.
Related, but slightly different,
it screws up collation. With collation you can find things with
diacritics even when you aren't putting the diacritics in yourself,
and sorting order gets done properly.
I don't see how this is relevant to text.
It *is* relevant to titles, but I already agree with you
that UTF-8 would be nice to have for those!
I think UTF-8 is the way to go. It's been out for
years, and is now
widely supported.
Not widely enough, if anthere is accurate evidence.
(Better evidence would be citations from server logs
for the various Latin-1 wikis that people want to switch over.)
I don't know anybody that would oppose switching everything to Unicode
once it's nearly universally supported -- so that's what it comes down to.
I don't want to get into this argument too much --
I support switching to UTF-8 if it won't screw things up,
and I oppose switching if it will screw things up.
Other than that, I just have some evidence (from meta)
that it *can* screw things up, so we need to watch for it;
but switching may well still be the right thing to do!
I just wanted to point that the functionality is there
(but not conveniently) in the body of the article (but not the title).
-- Toby