[Wikitech-l] Re: XML dump not well-formed because of unicode

13 Sep 2005


      Jakob Voss wrote:
...
Hi again,
I wrote:
...
When I tried to parse the current German XML dump I discovered the
following malformed sequence (in [[de:India]]):
You can remove the errors with a little perl script - only
a workaround for the current dump:
For me this worked fine: Replace every "&#" with "&amp;#" so the XML 
parser won't see the entity (first I used sed, now my program does the 
replacement before giving the stream to the parser). Of course the 
program using the data will have to care about it.
de:SirJective

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

[Wikitech-l] Re: XML dump not well-formed because of unicode