On Apr 6, 2005, at 2:59pm, John Blumel wrote:
Has anyone had success with submitting data that contains non-English characters via a bot? I'm currently working on some Perl scripts to extract and upload data... but I haven't had much success with uploading articles non-standard characters.
Well, out of sheer stubbornness, and after a couple of days in character encoding hell, I finally figured out how to get this to work. It wasn't exactly intuitive but, if I ensure that the data files containing the GftP article are Unicode (UTF-8) encoded and then encode them as Latin-1 (ISO-8859-1) in the bot script, the submissions to the wiki, which is configured for UTF-8, go through without any data corruption. This also works for the page titles which had been not getting escaped properly.
I'll leave it for someone else to explain why this works this way.
John Blumel