[Wikitech-l] XML and Unicode chars in tag names

14 Jul 2013


      Hi,
I know this is probably the wrong list to ask, but I don't know a
better one and I know we have a very good i18n team so I'm hoping
someone here can help me.
I'm trying to parse the following xml (abbriged for brevity):
<?xml version="1.0" encoding="UTF-8"?>
<județ>
  <siruta>47</siruta>
  <nume>Județul Bacău</nume>
</județ>
Every validator I've tried marks an error on the ț in the tag named
județ. However, the xml specs [1] says this is actually correct:
document	   ::=   	prolog element Misc*
element	   ::=   	EmptyElemTag | STag content ETag
STag	   ::=   	'<' Name (S Attribute)* S? '>'
Name	   ::=   	NameStartChar (NameChar)*
NameStartChar	   ::=   	":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] |
[#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] |
[#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF]
| [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]
NameChar	   ::=   	NameStartChar | "-" | "." | [0-9] | #xB7 |
[#x0300-#x036F] | [#x203F-#x2040]
ț is #x163 [2], thus should be in the interval [#xF8-#x2FF].
I have reached page 10 on google searching for "can the xml tags
contain utf8 letters" and "xml tags utf-8", but I found nothing
relevant. Am I missing something here? Is there any way around this?
Thanks,
   Strainu
[1] http://www.w3.org/TR/xml/
[2] http://ro.wikipedia.org/wiki/Wikipedia:Diacritice#Date_tehnice

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

[Wikitech-l] XML and Unicode chars in tag names