On 3/22/09 7:49 AM, Jeffrey Barish wrote:
- Some characters are not rendered correctly (e.g., IPA: [ˈvɔlfgaŋ
amaˈdeus ˈmoËtsart]).
You're showing the text as windows-1252, but it is UTF-8.
It seems that the html lacks the meta field that specifies the character encoding. The original page does not, of course. Is there a parameter that causes action=render to include the metadata? Am I using the wrong action?
You're getting an HTML fragment here, not a full HTML document. As the data consumer, it's your responsibility to ensure you're sending correct Content-Type headers or wrapping things in <html><head>blah blah</head></html> as necessary.
Can I safely assume that all Wikipedia pages use UTF-8?
Yes.
-- brion