Jakob Voss wrote:
Hi again,
I wrote:
When I tried to parse the current German XML dump I discovered the following malformed sequence (in [[de:India]]):
You can remove the errors with a little perl script - only a workaround for the current dump:
For me this worked fine: Replace every "&#" with "&#" so the XML parser won't see the entity (first I used sed, now my program does the replacement before giving the stream to the parser). Of course the program using the data will have to care about it.
de:SirJective