Hi,
I know this is probably the wrong list to ask, but I don't know a
better one and I know we have a very good i18n team so I'm hoping
someone here can help me.
I'm trying to parse the following xml (abbriged for brevity):
<?xml version="1.0" encoding="UTF-8"?>
<județ>
<siruta>47</siruta>
<nume>Județul Bacău</nume>
</județ>
Every validator I've tried marks an error on the ț in the tag named
județ. However, the xml specs [1] says this is actually correct:
document ::= prolog element Misc*
element ::= EmptyElemTag | STag content ETag
STag ::= '<' Name (S Attribute)* S? '>'
Name ::= NameStartChar (NameChar)*
NameStartChar ::= ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] |
[#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] |
[#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF]
| [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]
NameChar ::= NameStartChar | "-" | "." | [0-9] | #xB7 |
[#x0300-#x036F] | [#x203F-#x2040]
ț is #x163 [2], thus should be in the interval [#xF8-#x2FF].
I have reached page 10 on google searching for "can the xml tags
contain utf8 letters" and "xml tags utf-8", but I found nothing
relevant. Am I missing something here? Is there any way around this?
Thanks,
Strainu
[1]
http://www.w3.org/TR/xml/
[2]
http://ro.wikipedia.org/wiki/Wikipedia:Diacritice#Date_tehnice