Re: [Wikitech-l] XML and Unicode chars in tag names

14 Jul 2013


      2013/7/14 MZMcBride z@mzmcbride.com:
...
Strainu wrote:
...
I'm trying to parse the following xml (abbriged for brevity):
<?xml version="1.0" encoding="UTF-8"?>
<județ>
 <siruta>47</siruta>
 <nume>Județul Bacău</nume>
</județ>
Every validator I've tried marks an error on the ț in the tag named
județ.
Hi.
This list is a fine place to ask. :-)
Hi,
...
Are you having trouble with validation or parsing? Validators can simply
be wrong. Which validators are you using? And which parsers are you using?
I'm having trouble with both. I used the W3C validator [1], which
wasn't designed for random XML files, but can still find a good number
of errors and xmlvalidation.com [2]. On the parsing side, I tried with
python's lxml; the output is available at [3]
...
Can you be more specific about what you're trying to do (feel free to link
to or include sample code) and the tools you're trying to do it with?
Well, I have a PHP website which gathers public data about Romania's
administrative units, which I then try to export in
programming-friendly formats (CSV, JSON, XML). The workflow is:
extract the data from the database, put it in a PHP array, then use
this array to generate all the output formats. You have an example of
such an array at [4] (since my initial email I've worked around the
diacritics problem, but I'm still searching for a solution). For
converting to XML I have a custom array_walk function [5].
I know that some potential reusers are heavy XML fans, so I wanted to
give them an easy way to reuse the data. Having the XML tags/JSON keys
with diacritics is not a must have, but is definitely a very nice
feature, because those keys could be used directly as labels when
printing the data somewhere.
Regards,
   Strainu
[1] http://validator.w3.org/
[2] http://www.xmlvalidation.com/
[3] https://gist.github.com/mgax/f6a3edc5b4883b3377e8
[4] https://github.com/strainu/despresate/blob/master/include/sat_functions.php#...
[5] https://github.com/strainu/despresate/blob/master/include/common.php#L57

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] XML and Unicode chars in tag names