Petr Kadlec wrote:
OK, although I would imagine that wrapping the
parameters to
htmlspecialchars() would be a little bit "more correct", I can
understand that.
So I'll ask a question that seems to be far off this topic, but it is not. :-)
What importance, meaning and purpose is there in the following message
that appears in some LanguageXx.php files?
# This file is encoded in UTF-8, no byte order mark.
# For compatibility with Latin-1 installations, please
# don't add literal characters above U+00ff.
What difference is there between e.g. U+00FF (UTF-8 encoding C3, BF)
and U+0100 (encoded to C4, 80), with regards to Latin-1 installations?
That warning doesn't apply to LanguageCs.php. It only applies to the
language files with that comment. The story is that some wikis (in
particular en, da, nl and sv) have been encoded in latin-1 since the
year dot. The language files for those wikis used to be latin-1, but
that prevented the creation of new utf-8 wikis in those languages. So
Brion converted all the language files to utf-8, and wrote
LanguageLatin1.php, which uses iconv to convert the text to latin-1 at
runtime. Characters above U+00FF can't be represented in latin-1, and
are instead converted to a question mark. This is rarely an issue since
the languages with latin-1 wikis generally only need latin-1 characters.
So, my final deduction is that the abovementioned
message is rather
strange and I should ignore it, write any Unicode character to the
file normally, and generally not use HTML entities. (Which is
unfortunate especially for nbsp, which is normally indistinguishable
from a plain space character.) Am I correct?
Yes, that's correct.
-- Tim Starling