Pronunciations and IPA/SAMPA - WikiEN-l

4 Sep 2003


      There was some talk a while back about deciding on a standard method of 
indicating pronunciations on Wikipedia. Of course some people said 
pronunciations belong on Wiktionary, but that's beside the point: there 
are many articles where a discussion of the pronunciation of certain 
words is necessary, and there ought to be a standard way of notating that.
In fact, there is. The International Phonetic Alphabet is ideally suited 
to marking pronunciations of words, and is flexible enough to describe 
broad transcriptions that represent how a word is pronounced in multiple 
dialects to minute phonetic details. This wisdom, of course, has been 
lost on the makers of most American dictionaries, who each insist upon 
using their own ad-hoc pronunciation scheme (one of my personal pet 
peeves). The _Cambridge Dictionary of American English_ is a notable, if 
perhaps not well-known, exception. The foremost dictionary of (mostly) 
British English, the _Oxford English Dictionary_ uses IPA, as does the 
major Australian English dictionary,  _The Macquarie Dictionary_.
But I digress. There are several pages on the Wikipedia that deal 
specifically with pronunciations, for example [[List of words of 
disputed pronunciation]]. And the way that the pronunciations are listed 
on that page is the worst possible mix of ad-hoc pronunciation schemes. 
In fact, some of the ad-hac pronunciations given I couldn't even figure 
what they meant. (does AHSK rhyme with American _task_ or _mosque_?). 
Clearly some kind of standard scheme is needed.
I spent several hours today revamping that page, using IPA 
transcriptions and doing some serious research about which 
pronunciations are listed in what dictionaries. I put that page on 
[[List of words of disputed pronunciation/IPA]]. However, I later 
discovered to my tremendous dismay that the IPA letters simply do not 
display in IE. The scheme for encoding IPA in ASCII, called SAMPA, is 
capable of encoding anything in IPA, but it is not particularly readable 
(although some might argue the same about IPA). It was designed to be 
machine-readable, and it doesn't really seem like an adequate solution. 
It uses lots of non-alphabetic characters to represent sounds (the 'a' 
in _cat_ is '{' in SAMPA), and as a result SAMPA-ized pronunciations are 
frankly ugly.
Anyhow, it seems that just using the HTML entities for the Unicode IPA 
extensions is not an acceptable solution because it leaves IE users with 
lovely but useless rectangles where there ought to be IPA characters. 
There is a LaTeX extension called TIPA that allows the complete set of 
IPA characters and diacritics. If this were installed into the TeX math 
extensions, then a similar syntax could be used to generate images of 
the IPA from LaTeX input.
I see the following possible solutions (in the order that I think is good):
1.) Auto-detect the browser and send IPA Unicode to browsers that 
support it and TIPA LaTeX images to those that don't. (Pros: attractive 
display of IPA for all users. Cons: lots of  programming)
2.) Just send TIPA LaTeX images (Pros: attractive display of IPA. Cons: 
Uses images in text when for some users embedded IPA Unicode would look 
better)
3.) Store the IPA in a special format or in a special tag, auto-detect 
the browser and send IPA Unicode to browsers that support it and SAMPA 
to the rest. (Pros: doesn't require inserting images or using TeX. Cons: 
SAMPA is ugly and hard to read)
4.) Render IPA into GIFs or PNGs and just insert them as images. (Pros: 
compatible with everything. Cons: time-consuming, and difficult to change)
5.) Devise a Wikipedia-specific pronunciation scheme and just use that 
(blech!) (Pros: no coding required. Cons: YAAHPS (Yet Another Ad Hoc 
Pronunciation Scheme))
6.) Do nothing and continue to allow people to use ad-hoc pronunciation 
schemes (BLECH!!) (Pros: no action required. Cons: maintains status quo 
harms as described above)
Of course, no. 1 requires doing some coding and testing for what may end 
up being a feature used on just a few pages. On the other hand, such 
code could possibly be extremely useful for the Wiktionary. In the 
meantime, I'm going to leave [[List of words of disputed 
pronunciation/IPA]] as it is, and wait for suggestions.
Now of course there will be opponents of the IPA, because it's too 
technical or whatever reason. To those people I say the IPA for the 
purposes of representing English is really no more complicated than the 
pronunciation schemes used in American dictionaries, like the 
_Merriam-Webster Dictionary_, and the _Cambridge Dictionary of American 
English_, which is designed for learners of English, seems to do just 
fine with it.
- David [[User:Nohat]]
* 
http://www.wikipedia.org/wiki/List_of_words_of_disputed_pronunciation/IPA*