[WikiEN-l] Re: IPA pronounciation Unicode symbols, old browsers and Wikipedia

David Friedland david at nohat.net
Sat Apr 3 16:32:55 UTC 2004


(I'm cross-posting this to wikitech-l, please reply there)

David Gerard wrote:
> Please read [[Fucking, Austria]] and [[Talk:Fucking, Austria]].
> There's considerable discussion concerning IPA pronounciation
> symbols being unreadable in a stock install of Internet Explorer.
> 
> Is there any policy or relevant past discussion on the matter?
> We make many concessions to aging browsers on Wikipedia, but
> these are perfectly legitimate Unicode characters we're talking
> about here ...

I've brought this up before (at least a couple times), and this is the 
situation as I see it:

* Windows users that have the font "Arial Unicode" installed can select 
that as their default font in IE to see the IPA symbols. I've tested 
this and it works. This also works for other fonts that have IPA symbols 
like Gentium and Code2000, although those fonts have to be downloaded 
and installed by the user. On pages that include IPA symbols, we might 
suggest that the users select "Arial Unicode" as their default font, or 
link to a page that suggests this.
* If IPA symbols were surrounded by <span style="font-family:gentium, 
code2000, arial unicode, lucida grande"> tags, more people would be able 
to read them, as it will work on any recent Windows IE if they have one 
of those fonts installed, regardless of what their browser's default 
font setting is. However, despite my previous requests, the current 
MediaWiki software does not allow <span> tags. It is possible to use 
<font> tags to achieve the same results, but <font> tags have been 
deprecated from the HTML standard for nearly 8 years, and so should be 
avoided.
* The LaTeX package TIPA supports IPA and could render the symbols much 
like the current <math> tags do, if such support were installed. Such 
renderings would be visible in any browser.
* The WikiTeX system supports TIPA 
[http://wikisophia.org/wiki/Wikitex#Tipa], but isn't yet completed or 
supported on Wikipedia, although there is hope to do so.

My recommended solution is:
* Implement some kind of wiki markup for indicating IPA text in the wiki 
source, ideally as part of the WikiTeX system.
* Mark up all IPA symbols on Wikipedia using the markup!
* Logged-in users can set their preference to Unicode IPA, WikiTeX TIPA 
images, and/or (X-)SAMPA.
* This system would put <span style="font-family:..."> tags around 
Unicode IPA in the HTML output so Windows IE users who have set their 
preference to Unicode will be able to see the symbols regardless of 
their browser's font setting.

The following items also have to be considered
* What do do for anonymous users: should we do a browser detect and 
serve a page with unicode IPA or WikiTeX-ed image of the IPA, depending 
on the browser, or just send WikiTeX images to all anonymous users? It 
is hard to guarantee via a browser detect that the user has the proper 
unicode fonts, although certainly for all Safari users, the Unicode will 
render correctly, because Mac OS X has fonts with unicode characters 
installed by default. We need to do a browser/platform survey to 
determine what combinations are likely to support Unicode IPA.
* What format to use for the input?
**Unicode is a standard method of representing IPA symbols, but can be 
tricky to edit, and UTF-8 isn't supported on the English Wikipedia so it 
would have to be entered as entities.
**The TIPA input method isn't standard, and for some rare characters is 
(IMHO unnecessarily) verbose, but it supports a wider range of 
characters than Unicode, and is ASCII compatible.
**SAMPA or X-SAMPA is more of a standard than TIPA, and is also ASCII 
compatible, so it might be the best solution.
**The International Phonetic Association has specified a set of 3-digit 
numbers for every symbol in the International Phonetic Alphabet, which 
is specified in Appendix 2 of the Handbook of the International Phonetic 
Association, so this is a standard endorsed by the creators of the IPA, 
but it's not particularly accessible, as I haven't seen a listing of IPA 
numbers on the web anywhere.

One might be tempted to just use TIPA TeX rendering for IPA symbols, 
thereby reducing the complexity and the amount of development required, 
but there are several disadvantages to this method:
* TeX renderings don't look good when printed, as they are rendered at 
72 dpi.
* TeX renderings can't be enlarged by changing the font size, something 
people with poor vision or low quality displays often do, and would be 
especially helpful for unfamiliar symbols like IPA symbols.
* Inline TeX renderings won't be in the same font and may not be 
properly aligned with the surrounding text, and so may not be attractive.

Therefore, where it is possible to use Unicode IPA, we should.

Whatever method is chosen, at least one translator will have to be 
created to convert from one input method to another. I've already 
created such a system using lex to convert SAMPA input to HTML entities 
of Unicode symbols, so I can attest that it's not particularly 
difficult, although it's a bit tedious.

Last, although pronunciations are somewhat rare on Wikipedia, there are 
some pages that rely on them, like 
[http://en.wikipedia.org/wiki/List_of_words_of_disputed_pronunciation]. 
Nevertheless, good support for IPA symbols would be a valuable component 
of MediaWiki, as it obviously would be widely used on Wiktionary.

- David




More information about the WikiEN-l mailing list