Hi,
I've been pulling down pages from wiktionary in a Java application. The majority of pages seem to work fine (e.g. http://en.wiktionary.org//wiki/-a). I can load them in Java, and if I wget them, I end up with a file containing what I'd expect.
However, some pages seem not to work (e.g. http://en.wiktionary.org/wiki/absolute_instrument). In Java, I get a codec exception and when using wget, the resulting downloaded file is garbled. I think this is because although they claim to be UTF-8 encoded, they are not. These pages show up fine in my browser, but it isn't telling me what charset it uses to decode the text.
Is this a known issue? Is there any workaround for this? Can it be fixed server-side?
Thanks,
Matthew