Hi again,
I wrote:
When I tried to parse the current German XML dump I
discovered the
following malformed sequence (in [[de:India]]):
You can remove the errors with a little perl script - only
a workaround for the current dump:
#!/usr/bin/perl
while(<>) {
next if ($_ =~ /^\[\[got:�/);
if ($_ =~ /^\[\[zh:�/) {
print "</text>";
} else {
print $_;
}
}