A few messages back (on WikiEN-l) I sent some accented characters that are found in Unicode/UTF-n but not in Latin-1 using Yahoo! Mail. These apparently appeared on Timwi's client as HTML entities, but on my client they displayed properly. Is this just a bug with Y!Mail?
Most other special characters cannot be easily displayed in my browser. I get the digested form, and some messages use different character sets than others - and Mozilla autoguesses the character set as a third, in most cases. In one digest with links to the Hindi Wikipedia and a UTF-8 apostrophe, the default character set was Chinese Simplified.
Is there something the software can do about this? We can't assume everyone'll send in UTF-8 (which would have been nice), and for those like me who receive digests, it's impossible to get the browser to display two character sets on the same page. Could the digesting software automatically convert all e-mails to the same character set, preferably UTF-8?
--[[User:Geoffrey|]] Thomas
__________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com
Geoffrey Thomas wrote:
A few messages back (on WikiEN-l) I sent some accented characters that are found in Unicode/UTF-n but not in Latin-1 using Yahoo! Mail. These apparently appeared on Timwi's client as HTML entities, but on my client they displayed properly. Is this just a bug with Y!Mail?
Most other special characters cannot be easily displayed in my browser. I get the digested form, and some messages use different character sets than others - and Mozilla autoguesses the character set as a third, in most cases. In one digest with links to the Hindi Wikipedia and a UTF-8 apostrophe, the default character set was Chinese Simplified.
Links should always be given in the encoded form, eg: http://hi.wikipedia.org/wiki/%E0%A4%AE%E0%A5%81%E0%A4%96%E0%A5%8D%E0%A4%AF_%...
rather than raw: http://hi.wikipedia.org/wiki/%E0%A4%AE%E0%A5%81%E0%A4%96%E0%A5%8D%E0%A4%AF_%...
The downside is that the links are ugly, illegible, and can get very long when dealing with non-latin writing systems (6 or 9 URL-encoded bytes per Unicode character). The upside is that they are pure ASCII and (modulo line break issues) will work transparently.
In most browsers, the encoded form will be shown in the URL bar and cut-n-pasted via eg right-click-copy-this-link.
Is there something the software can do about this? We can't assume everyone'll send in UTF-8 (which would have been nice), and for those like me who receive digests, it's impossible to get the browser to display two character sets on the same page. Could the digesting software automatically convert all e-mails to the same character set, preferably UTF-8?
Go to the list page, eg: http://mail.wikipedia.org/mailman/listinfo/wikien-l
and put your address in under "edit options" down at the bottom.
Change your options to use MIME digests instead of plain text digests: this should make the messages individual attachments instead of dumping everything in one text chunk, and is more likely to not break things like encodings and messages with their own attachments. Of course, I don't know if your mail client will work with that...
-- brion vibber (brion @ pobox.com)
Geoffrey Thomas wrote in part:
Most other special characters cannot be easily displayed in my browser. I get the digested form, and some messages use different character sets than others - and Mozilla autoguesses the character set as a third, in most cases. In one digest with links to the Hindi Wikipedia and a UTF-8 apostrophe, the default character set was Chinese Simplified.
Try setting your options for the mailing list so that the digest comes with MIME enabled. This is off by default in case of broken readers, but there's no reason that anybody should leave it off if they have a decent mail reader -- like Mozilla. Then if an individual message specifies its charset, your mail reader will know and won't have to guess.
No guarantees that this will work, but try it.
-- Toby
wikitech-l@lists.wikimedia.org