Hi,
I know this is probably the wrong list to ask, but I don't know a better one and I know we have a very good i18n team so I'm hoping someone here can help me.
I'm trying to parse the following xml (abbriged for brevity):
<?xml version="1.0" encoding="UTF-8"?> <județ> <siruta>47</siruta> <nume>Județul Bacău</nume> </județ>
Every validator I've tried marks an error on the ț in the tag named județ. However, the xml specs [1] says this is actually correct:
document ::= prolog element Misc* element ::= EmptyElemTag | STag content ETag STag ::= '<' Name (S Attribute)* S? '>' Name ::= NameStartChar (NameChar)* NameStartChar ::= ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF] NameChar ::= NameStartChar | "-" | "." | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]
ț is #x163 [2], thus should be in the interval [#xF8-#x2FF].
I have reached page 10 on google searching for "can the xml tags contain utf8 letters" and "xml tags utf-8", but I found nothing relevant. Am I missing something here? Is there any way around this?
Thanks, Strainu
[1] http://www.w3.org/TR/xml/ [2] http://ro.wikipedia.org/wiki/Wikipedia:Diacritice#Date_tehnice