There was some talk a while back about deciding on a standard method of indicating pronunciations on Wikipedia. Of course some people said pronunciations belong on Wiktionary, but that's beside the point: there are many articles where a discussion of the pronunciation of certain words is necessary, and there ought to be a standard way of notating that.
In fact, there is. The International Phonetic Alphabet is ideally suited to marking pronunciations of words, and is flexible enough to describe broad transcriptions that represent how a word is pronounced in multiple dialects to minute phonetic details. This wisdom, of course, has been lost on the makers of most American dictionaries, who each insist upon using their own ad-hoc pronunciation scheme (one of my personal pet peeves). The _Cambridge Dictionary of American English_ is a notable, if perhaps not well-known, exception. The foremost dictionary of (mostly) British English, the _Oxford English Dictionary_ uses IPA, as does the major Australian English dictionary, _The Macquarie Dictionary_.
But I digress. There are several pages on the Wikipedia that deal specifically with pronunciations, for example [[List of words of disputed pronunciation]]. And the way that the pronunciations are listed on that page is the worst possible mix of ad-hoc pronunciation schemes. In fact, some of the ad-hac pronunciations given I couldn't even figure what they meant. (does AHSK rhyme with American _task_ or _mosque_?). Clearly some kind of standard scheme is needed.
I spent several hours today revamping that page, using IPA transcriptions and doing some serious research about which pronunciations are listed in what dictionaries. I put that page on [[List of words of disputed pronunciation/IPA]]. However, I later discovered to my tremendous dismay that the IPA letters simply do not display in IE. The scheme for encoding IPA in ASCII, called SAMPA, is capable of encoding anything in IPA, but it is not particularly readable (although some might argue the same about IPA). It was designed to be machine-readable, and it doesn't really seem like an adequate solution. It uses lots of non-alphabetic characters to represent sounds (the 'a' in _cat_ is '{' in SAMPA), and as a result SAMPA-ized pronunciations are frankly ugly.
Anyhow, it seems that just using the HTML entities for the Unicode IPA extensions is not an acceptable solution because it leaves IE users with lovely but useless rectangles where there ought to be IPA characters. There is a LaTeX extension called TIPA that allows the complete set of IPA characters and diacritics. If this were installed into the TeX math extensions, then a similar syntax could be used to generate images of the IPA from LaTeX input. I see the following possible solutions (in the order that I think is good):
1.) Auto-detect the browser and send IPA Unicode to browsers that support it and TIPA LaTeX images to those that don't. (Pros: attractive display of IPA for all users. Cons: lots of programming)
2.) Just send TIPA LaTeX images (Pros: attractive display of IPA. Cons: Uses images in text when for some users embedded IPA Unicode would look better)
3.) Store the IPA in a special format or in a special tag, auto-detect the browser and send IPA Unicode to browsers that support it and SAMPA to the rest. (Pros: doesn't require inserting images or using TeX. Cons: SAMPA is ugly and hard to read)
4.) Render IPA into GIFs or PNGs and just insert them as images. (Pros: compatible with everything. Cons: time-consuming, and difficult to change)
5.) Devise a Wikipedia-specific pronunciation scheme and just use that (blech!) (Pros: no coding required. Cons: YAAHPS (Yet Another Ad Hoc Pronunciation Scheme))
6.) Do nothing and continue to allow people to use ad-hoc pronunciation schemes (BLECH!!) (Pros: no action required. Cons: maintains status quo harms as described above)
Of course, no. 1 requires doing some coding and testing for what may end up being a feature used on just a few pages. On the other hand, such code could possibly be extremely useful for the Wiktionary. In the meantime, I'm going to leave [[List of words of disputed pronunciation/IPA]] as it is, and wait for suggestions.
Now of course there will be opponents of the IPA, because it's too technical or whatever reason. To those people I say the IPA for the purposes of representing English is really no more complicated than the pronunciation schemes used in American dictionaries, like the _Merriam-Webster Dictionary_, and the _Cambridge Dictionary of American English_, which is designed for learners of English, seems to do just fine with it.
- David [[User:Nohat]]
* http://www.wikipedia.org/wiki/List_of_words_of_disputed_pronunciation/IPA*
This is probably the most well thought out addressing of this issue ever done on wp. I must say this is impressive and inline with the consensus of
No unicode IPA on IE?? Hmm. Well, considering the expensive workarounds you listed -- as necessary to accomodate IE users -- for a fix that entirely in Microsoft's domain, I would lean toward calling the IPA Unicode as "standard" anyway, and let the ?? or Xboxes be the problem of the IE end user. This is already the case for any character sets that arent loaded up anyway -- (I have yet to load a Hindi character set for example. ;) Soon afterward someone will write a hack to accomodate IE no doubt, but theres no reason not to push the Unicode IPA as the standard right now.
But that still doesnt deal with the problem of easy input via a Roman character set. A little conversion hack from the pseudovalues (/s/) to their IPA equvalents should be a first priority , and I would do it myself if I had the time, or could program a little better (late bloomer ok..)
As always with apologies to the hackers, -S-
--- David Friedland david@nohat.net wrote:
There was some talk a while back about deciding on a standard method of indicating pronunciations on Wikipedia. Of course some people said pronunciations belong on Wiktionary, but that's beside the point: there are many articles where a discussion of the pronunciation of certain words is necessary, and there ought to be a standard way of notating that.
In fact, there is. The International Phonetic Alphabet is ideally suited to marking pronunciations of words, and is flexible enough to describe broad transcriptions that represent how a word is pronounced in multiple dialects to minute phonetic details. This wisdom, of course, has been lost on the makers of most American dictionaries, who each insist upon using their own ad-hoc pronunciation scheme (one of my personal pet peeves). The _Cambridge Dictionary of American English_ is a notable, if perhaps not well-known, exception. The foremost dictionary of (mostly) British English, the _Oxford English Dictionary_ uses IPA, as does the major Australian English dictionary, _The Macquarie Dictionary_.
But I digress. There are several pages on the Wikipedia that deal specifically with pronunciations, for example [[List of words of disputed pronunciation]]. And the way that the pronunciations are listed on that page is the worst possible mix of ad-hoc pronunciation schemes. In fact, some of the ad-hac pronunciations given I couldn't even figure what they meant. (does AHSK rhyme with American _task_ or _mosque_?). Clearly some kind of standard scheme is needed.
I spent several hours today revamping that page, using IPA transcriptions and doing some serious research about which pronunciations are listed in what dictionaries. I put that page on [[List of words of disputed pronunciation/IPA]]. However, I later discovered to my tremendous dismay that the IPA letters simply do not display in IE. The scheme for encoding IPA in ASCII, called SAMPA, is capable of encoding anything in IPA, but it is not particularly readable (although some might argue the same about IPA). It was designed to be machine-readable, and it doesn't really seem like an adequate solution. It uses lots of non-alphabetic characters to represent sounds (the 'a' in _cat_ is '{' in SAMPA), and as a result SAMPA-ized pronunciations are frankly ugly.
Anyhow, it seems that just using the HTML entities for the Unicode IPA extensions is not an acceptable solution because it leaves IE users with lovely but useless rectangles where there ought to be IPA characters. There is a LaTeX extension called TIPA that allows the complete set of IPA characters and diacritics. If this were installed into the TeX math extensions, then a similar syntax could be used to generate images of the IPA from LaTeX input. I see the following possible solutions (in the order that I think is good):
1.) Auto-detect the browser and send IPA Unicode to browsers that support it and TIPA LaTeX images to those that don't. (Pros: attractive display of IPA for all users. Cons: lots of programming)
2.) Just send TIPA LaTeX images (Pros: attractive display of IPA. Cons: Uses images in text when for some users embedded IPA Unicode would look better)
3.) Store the IPA in a special format or in a special tag, auto-detect the browser and send IPA Unicode to browsers that support it and SAMPA to the rest. (Pros: doesn't require inserting images or using TeX. Cons: SAMPA is ugly and hard to read)
4.) Render IPA into GIFs or PNGs and just insert them as images. (Pros: compatible with everything. Cons: time-consuming, and difficult to change)
5.) Devise a Wikipedia-specific pronunciation scheme and just use that (blech!) (Pros: no coding required. Cons: YAAHPS (Yet Another Ad Hoc Pronunciation Scheme))
6.) Do nothing and continue to allow people to use ad-hoc pronunciation schemes (BLECH!!) (Pros: no action required. Cons: maintains status quo harms as described above)
Of course, no. 1 requires doing some coding and testing for what may end up being a feature used on just a few pages. On the other hand, such code could possibly be extremely useful for the Wiktionary. In the meantime, I'm going to leave [[List of words of disputed pronunciation/IPA]] as it is, and wait for suggestions.
Now of course there will be opponents of the IPA, because it's too technical or whatever reason. To those people I say the IPA for the purposes of representing English is really no more complicated than the pronunciation schemes used in American dictionaries, like the _Merriam-Webster Dictionary_, and the _Cambridge Dictionary of American English_, which is designed for learners of English, seems to do just fine with it.
- David [[User:Nohat]]
http://www.wikipedia.org/wiki/List_of_words_of_disputed_pronunciation/IPA*
WikiEN-l mailing list WikiEN-l@Wikipedia.org http://mail.wikipedia.org/mailman/listinfo/wikien-l
__________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com
I've done some testing now at home on my Mac, and neither Mac Phoenix nor Mac Internet Explorer correctly display the Unicode IPA extensions. Safari displays most of them, but is missing some critical symbols, like the 'er' sound in 'her'.
It seems that a solution that works entirely correctly for the majority of browser users is really the only acceptable solution. Since most browser users use IE, just using Unicode IPA isn't really going to cut.
Since we cannot rely on browsers to correctly render IPA, we'll have to render ourself, at the server-side. Since I've done my testing, I really think option 1 from my first message is the best:
1.) Auto-detect the browser and send IPA Unicode to browsers that support it and TIPA LaTeX images to those that don't. (Pros: attractive display of IPA for all users. Cons: lots of programming)
This way, as certain browser/OS combinations come to be known to be reliably reproducing IPA, we can let them get the Unicode IPA, and everyone else gets LaTeX'ed IPA or if necessary, SAMPA.
I guess I should put my code where my mouth is and learn more about how the math TeX extensions work with the Wikipedia back end and make it go myself. In my copious free time.
On a mostly unrelated-note, perhaps explaining my obsession with the topic, I work for a company that makes TTS (Speech Synthesis) software. I work with phonetic representations of words all day long. Something that might be cool for some pages on Wikipedia and definitely for all of Wiktionary would be to have TTS-generated samples of how things are pronounced. And before you complain about how robotic and wobbly TTS sounds, you should listen to some of the most modern voices out there. They sound very natural. Check out [[Speech synthesis]] for a list of good voices with free web demos. We could probably negotiate a deal with one of the companies wherein we include their TTS samples in the Wikipedia in exchange for clearly marking where the TTS samples came from. Since it costs virtually nothing to generate the samples, it would be essentially free advertising for the company. And for all the people who see pronunciation schemes as indecipherable Greek, a good sound sample clarifies any phonetic confusion, and doesn't force poor Wikipedia users to listen to crappy home recordings of our geeky voices.
Cheers! - David [[User:Nohat]]
Steve Vertigum wrote:
This is probably the most well thought out addressing of this issue ever done on wp. I must say this is impressive and inline with the consensus of
No unicode IPA on IE?? Hmm. Well, considering the expensive workarounds you listed -- as necessary to accomodate IE users -- for a fix that entirely in Microsoft's domain, I would lean toward calling the IPA Unicode as "standard" anyway, and let the ?? or Xboxes be the problem of the IE end user. This is already the case for any character sets that arent loaded up anyway -- (I have yet to load a Hindi character set for example. ;) Soon afterward someone will write a hack to accomodate IE no doubt, but theres no reason not to push the Unicode IPA as the standard right now.
But that still doesnt deal with the problem of easy input via a Roman character set. A little conversion hack from the pseudovalues (/s/) to their IPA equvalents should be a first priority , and I would do it myself if I had the time, or could program a little better (late bloomer ok..)
As always with apologies to the hackers, -S-
On Thu, 2003-09-04 at 20:22, David Friedland wrote:
I've done some testing now at home on my Mac, and neither Mac Phoenix nor Mac Internet Explorer correctly display the Unicode IPA extensions. Safari displays most of them, but is missing some critical symbols, like the 'er' sound in 'her'.
Is there a problem with rendering those characters, or is it just that standard system fonts don't include them? If the latter, are there free fonts we could recommend to people?
-- brion vibber (brion @ pobox.com)
On Fri, 05 Sep 2003 00:06:28 -0700, Brion Vibber brion@pobox.com gave utterance to the following:
On Thu, 2003-09-04 at 20:22, David Friedland wrote:
I've done some testing now at home on my Mac, and neither Mac Phoenix nor Mac Internet Explorer correctly display the Unicode IPA extensions. Safari displays most of them, but is missing some critical symbols, like the 'er' sound in 'her'.
Is there a problem with rendering those characters, or is it just that standard system fonts don't include them? If the latter, are there free fonts we could recommend to people?
The following information comes from Alan Wood's extensive unicode information site: http://www.alanwood.net/unicode/fonts_windows.html#ipa
IPA Fonts --------- SILDoulosUnicodeIPA – 532 glyphs in version 4.0a4 Ranges: Basic Latin; Latin-1 Supplement; Latin Extended-A (few); IPA Extensions; Spacing Modifier Letters; Combining Diacritical Marks; General Punctuartion; Mathematical Operators OpenType layout tables: Latin Family: Serif Styles: Regular Availability: Free download from SIL Unicode IPA Font beta. Includes a keyboard layout produced with Keyman.
ALPHABETUM Unicode, Arial Unicode MS, Bitstream CyberBit, Bitstream CyberCJK, Cardo, Caslon, Code2000, Free Monospaced, Gentium, GentiumAlt, Junicode, Lucida Sans, Lucida Sans Unicode, Monospace, MS Mincho, Naqsh, SImPL, Thryomanes and TITUS Cyberbit Basic can also display IPA Extensions ---------- (Back to my comments) Of the above, several are proprietary (cost unknown). MS Arial Unicode is 23MB and requires a MS Office or Publisher license - Microsoft removed the free download page last year. Code2000 is shareware, and about 2MB. Unlike regular users, I have a wide range of unicode fonts for use in testing browsers. In Opera I can see all but the last two characters from Alan's test page http://www.alanwood.net/unicode/ipa_extensions.html
David Friedland wrote:
In fact, there is. The International Phonetic Alphabet is ideally suited to marking pronunciations of words, and is flexible enough to describe broad transcriptions that represent how a word is pronounced in multiple dialects to minute phonetic details. This wisdom, of course, has been lost on the makers of most American dictionaries, who each insist upon using their own ad-hoc pronunciation scheme (one of my personal pet peeves). The _Cambridge Dictionary of American English_ is a notable, if perhaps not well-known, exception. The foremost dictionary of (mostly) British English, the _Oxford English Dictionary_ uses IPA, as does the major Australian English dictionary, _The Macquarie Dictionary_.
I see the following possible solutions (in the order that I think is good):
1.) Auto-detect the browser and send IPA Unicode to browsers that support it and TIPA LaTeX images to those that don't. (Pros: attractive display of IPA for all users. Cons: lots of programming)
2.) Just send TIPA LaTeX images (Pros: attractive display of IPA. Cons: Uses images in text when for some users embedded IPA Unicode would look better)
3.) Store the IPA in a special format or in a special tag, auto-detect the browser and send IPA Unicode to browsers that support it and SAMPA to the rest. (Pros: doesn't require inserting images or using TeX. Cons: SAMPA is ugly and hard to read)
4.) Render IPA into GIFs or PNGs and just insert them as images. (Pros: compatible with everything. Cons: time-consuming, and difficult to change)
5.) Devise a Wikipedia-specific pronunciation scheme and just use that (blech!) (Pros: no coding required. Cons: YAAHPS (Yet Another Ad Hoc Pronunciation Scheme))
6.) Do nothing and continue to allow people to use ad-hoc pronunciation schemes (BLECH!!) (Pros: no action required. Cons: maintains status quo harms as described above)
I've snipped your message, but it was all extemely well put :) ad-hoc schemes are a peeve of mine too.
I'm opposed to options 5 & 6 :) My opinion on the matter so far has been to stick with SAMPA until we can do something about using IPA. We could stick to SAMPA in wiki source text, since everybody can edit it, and tag it. Then browser-detect and send IPA text / IPA in PNG depending.
A smart thing to do if the renderer knows about SAMPA would be to automagically provide a link to the SAMPA or IPA key. eg: /sampa/k{t/ turns into some IPA which if clicked takes the reader to the explanation of tehe symbols.
David Friedland wrote:
Anyhow, it seems that just using the HTML entities for the Unicode IPA extensions is not an acceptable solution because it leaves IE users with lovely but useless rectangles where there ought to be IPA characters. There is a LaTeX extension called TIPA that allows the complete set of IPA characters and diacritics. If this were installed into the TeX math extensions, then a similar syntax could be used to generate images of the IPA from LaTeX input. I see the following possible solutions (in the order that I think is
good):
1.) Auto-detect the browser and send IPA Unicode to browsers that support it and TIPA LaTeX images to those that don't. (Pros: attractive display of IPA for all users. Cons: lots of programming)
2.) Just send TIPA LaTeX images (Pros: attractive display of IPA. Cons: Uses images in text when for some users embedded IPA Unicode would look better)
3.) Store the IPA in a special format or in a special tag, auto-detect the browser and send IPA Unicode to browsers that support it and SAMPA to the rest. (Pros: doesn't require inserting images or using TeX. Cons: SAMPA is ugly and hard to read)
4.) Render IPA into GIFs or PNGs and just insert them as images. (Pros: compatible with everything. Cons: time-consuming, and difficult to
change)
5.) Devise a Wikipedia-specific pronunciation scheme and just use that (blech!) (Pros: no coding required. Cons: YAAHPS (Yet Another Ad Hoc Pronunciation Scheme))
6.) Do nothing and continue to allow people to use ad-hoc pronunciation schemes (BLECH!!) (Pros: no action required. Cons: maintains status quo harms as described above)
I was just thinking of this problem, and the idea I came up with was to have an option in user preferences of something like "Display pronunciations in: o Unicode IPA o SAMPA" and then anything in an article which begins with "SAMPA " would be detected and displayed correctly (converting SAMPA to IPA if necessary), similarly to the idea with the magic ISBNs. I think this is probably the simplest solution to get working quickly, and it can be easily expanded to include additional ASCII IPA schemes (there are several) or auto-generated IPA images if someone implements that. Also, someone using IE but who has the correct fonts installed would be able to see IPA.
You malign ad hoc pronunciation schemes, but they do have *some* redeeming value. You can use a single ad-hoc system to represent different dialects more easily than you can use IPA for the same purpose, since users will read their own dialect into the pronunciation guide for the ad-hoc system. Still, I can't imagine making up an ad-hoc scheme for wikipedia; IPA is probably best for us.
I'm digging around the code to see how this could be done (and learning PHP), but in the meantime, any comments?.
(Anything more on this should probably go to wikitech-l.)
-- Adam Raizen
Adam Raizen wrote in part:
You malign ad hoc pronunciation schemes, but they do have *some* redeeming value. You can use a single ad-hoc system to represent different dialects more easily than you can use IPA for the same purpose, since users will read their own dialect into the pronunciation guide for the ad-hoc system. Still, I can't imagine making up an ad-hoc scheme for wikipedia; IPA is probably best for us.
This is what morphophones are all about -- a scheme where all dialects read in their own sound. We don't have to invent our own ad-hoc scheme, since linguists have been studying morphophones, and quite often in the context of English, since 1962. (IPA, in contrast, does phonemes, or even lower-level structures.)
The "Webster's Dictionary" systems often seen in US dictionaries are roughly morphophonic, but not very sophisticated linguistically. (But Merriam-Webster's current system is phonemic, despite it's old-fashioned non-IPA, Webster's-ish look. Therefore the worst of them all, IMO.)
-- Toby
Adam Raizen wrote in part:
You malign ad hoc pronunciation schemes, but they do have *some* redeeming value. You can use a single ad-hoc system to represent different dialects more easily than you can use IPA for the same purpose, since users will read their own dialect into the pronunciation guide for the ad-hoc system. Still, I can't imagine making up an ad-hoc scheme for wikipedia; IPA is probably best for us.
I agree with this criticism of IPA -- how can IPA even be remotely useful for us, given that there is no one correct phoneme mapping for nearly *any* word in the English language? Are we going to have dozens of different IPA entries for each word, representing the full range of pronunciation in the English of England (including many dialects), Scotland, Wales, Ireland, Australia, South Africa, India, the United States (including many dialects), etc.? And how about for the range of pronunciation of Chinese words within different parts of China, or countries outside China that also have significant Chinese-speaking populations? The whole thing just seems pretty useless.
-Mark
Delirium wrote:
Adam Raizen wrote in part:
You malign ad hoc pronunciation schemes, but they do have *some* redeeming value. You can use a single ad-hoc system to represent different dialects more easily than you can use IPA for the same purpose, since users will read their own dialect into the pronunciation guide for the ad-hoc system. Still, I can't imagine making up an ad-hoc scheme for wikipedia; IPA is probably best for us.
I agree with this criticism of IPA -- how can IPA even be remotely useful for us, given that there is no one correct phoneme mapping for nearly *any* word in the English language? Are we going to have dozens of different IPA entries for each word, representing the full range of pronunciation in the English of England (including many dialects), Scotland, Wales, Ireland, Australia, South Africa, India, the United States (including many dialects), etc.? And how about for the range of pronunciation of Chinese words within different parts of China, or countries outside China that also have significant Chinese-speaking populations? The whole thing just seems pretty useless.
-Mark
The nice thing about IPA is that it allows to to have a range of phonetic details. You can specify exactly where a vowel is with respect to, for example, Daniel Jones' cardinal vowels, or you can just use the plain vowel symbol, meaning it's somewhere near that vowel.
The problem is fundamentally that dialects _do_ sound different and using the system "this sound sounds like this sound in another word" breaks down eventually.
There are, however, standard dialects, and other dialects can be described in terms of those standards. Likewise, pronunciations should be presented in the standards, and speakers who are unsure how their dialect differs from the standard can view the page on their dialect.
In the cases where a word is pronounced in a dialect in a way that is not predicted by the regular differences between the dialect and the standand, then it seems only reasonable than to present that dialect's idiosyncratic pronunciation along with the standards.
- David [User:Nohat]
I tend to agree. But at some point, the effort put into correcting this might overshoot the effort of simply adding Vorbis audio ("speex" codec, whenever it comes out) to each entry.
"What about China...all those dialects?"
Pinyin covers those-- not in a linguistic way, but a political way. There are far more positives for using IPA -- namely that its compatible with SAMPA, and that this might someday be used on WP to machine read text -- which would be velly nice.
~S~
--- Delirium delirium@rufus.d2g.com wrote:
Adam Raizen wrote in part:
You malign ad hoc pronunciation schemes, but they
do have *some*
redeeming value. You can use a single ad-hoc
system to represent
different dialects more easily than you can use
IPA for the same
purpose, since users will read their own dialect
into the pronunciation
guide for the ad-hoc system. Still, I can't
imagine making up an ad-hoc
scheme for wikipedia; IPA is probably best for us.
I agree with this criticism of IPA -- how can IPA even be remotely useful for us, given that there is no one correct phoneme mapping for nearly *any* word in the English language? Are we going to have dozens of different IPA entries for each word, representing the full range of pronunciation in the English of England (including many dialects), Scotland, Wales, Ireland, Australia, South Africa, India, the United States (including many dialects), etc.? And how about for the range of pronunciation of Chinese words within different parts of China, or countries outside China that also have significant Chinese-speaking populations? The whole thing just seems pretty useless.
-Mark
WikiEN-l mailing list WikiEN-l@Wikipedia.org http://mail.wikipedia.org/mailman/listinfo/wikien-l
__________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com
Toby Bartels wrote:
This is what morphophones are all about -- a scheme where all dialects read in their own sound. We don't have to invent our own ad-hoc scheme, since linguists have been studying morphophones, and quite often in the context of English, since 1962. (IPA, in contrast, does phonemes, or even lower-level structures.)
The "Webster's Dictionary" systems often seen in US dictionaries are roughly morphophonic, but not very sophisticated linguistically. (But Merriam-Webster's current system is phonemic, despite it's old-fashioned non-IPA, Webster's-ish look. Therefore the worst of them all, IMO.)
The American Heritage Dictionary gives the following explanation of their pronunciation scheme:
"For most words a single set of symbols can represent the pronunciation found in each regional variety of American English. You will supply those features of your own regional speech that are called forth by the pronunciation key in this Dictionary"
And it seems like a panacea for the pronunciation problem. But it's not, because some words simply have different underlying representations in different dialects, and the system only works for dialects that are roughly the same except for a few sound changes. It fails for wildly or even mildly divergent dialects. The American Heritage Dictionary system sweeps this problem under the rug by saying "The pronunciations are exclusively those of educated speech", which, to my mind, is a cop-out, and not a satisfactory solution for Wikipedia.
However, the question of dialect remains. Obviously listing pronunciations in all possible dialects is not a reasonable solution, and indeed, nor are any of the systems used in American dictionaries. I recognize that the general task of specifying a pronunciation that speakers of any dialect will automatically speak in their dialect is not ideally handled by IPA. However, I have do not know of any system advocated by linguists other than what phonologists call "broad transcription" using IPA. Can you point me to a book or paper, written by linguists, that specifies such a system for English, and advocates its use by and for general (non-academic) readers?
I have never encoutered such a system, and I doubt that one exists. Barring the existence of a standard system, I don't really see that Wikipedia has any other options besides IPA for specifying pronunciations. Certainly I hope no one thinks Wikipedia should invent its own system. When it comes to standards, it should be our job to follow them and describe them, not create them.
So I advocate having IPA transcriptions for standard dialects (like Standard American English and Received Pronunciation), and having special pages describing how the various nonstardard dialects differ both phonetically and phonemically from the standards. I don't know much about morphophones and I'm not sure it's a concept widely accepted by linguists.
PS: I have made a page on meta called [[Pronunciations]] and am going through the list archives and posting links to relevant discussions there. I'm not sure what the policy should be regarding where further discussion should occur, so if you want to respond, do so either here or on the list.
-- David [[User:Nohat]]
It's worth noting I've never considered any of the pronunciation schemes to be worth anything. If it's an English word, I go to m-w.com and listen to the pronunciation wav files.
-- Jake
--- David Friedland david@nohat.net wrote:
The American Heritage Dictionary gives the following explanation of their pronunciation scheme:
"For most words a single set of symbols can represent the pronunciation found in each regional variety of American English. You will supply those features of your own regional speech that are called forth by the pronunciation key in this Dictionary"
And it seems like a panacea for the pronunciation problem. But it's not, because some words simply have different underlying representations in different dialects, and the system only works for dialects that are roughly the same except for a few sound changes. It fails for wildly or even mildly divergent dialects. The American Heritage Dictionary system sweeps this problem under the rug by saying "The pronunciations are exclusively those of educated speech", which, to my mind, is a cop-out, and not a satisfactory solution for Wikipedia.
However, the question of dialect remains. Obviously listing pronunciations in all possible dialects is not a reasonable solution, and indeed, nor are any of the systems used in American dictionaries. I recognize that the general task of specifying a pronunciation that speakers of any dialect will automatically speak in their dialect is not ideally handled by IPA. However, I have do not know of any system advocated by linguists other than what phonologists call "broad transcription" using IPA. Can you point me to a book or paper, written by linguists, that specifies such a system for English, and advocates its use by and for general (non-academic) readers?
I have never encoutered such a system, and I doubt that one exists. Barring the existence of a standard system, I don't really see that Wikipedia has any other options besides IPA for specifying pronunciations. Certainly I hope no one thinks Wikipedia should invent its own system. When it comes to standards, it should be our job to follow them and describe them, not create them.
So I advocate having IPA transcriptions for standard dialects (like Standard American English and Received Pronunciation), and having special pages describing how the various nonstardard dialects differ both phonetically and phonemically from the standards. I don't know much about morphophones and I'm not sure it's a concept widely accepted by linguists.
PS: I have made a page on meta called [[Pronunciations]] and am going through the list archives and posting links to relevant discussions there. I'm not sure what the policy should be regarding where further discussion should occur, so if you want to respond, do so either here or on the list.
-- David [[User:Nohat]]
What about the system Nupedia uses? LDan
__________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com
David Friedland wrote about morphophones:
And it seems like a panacea for the pronunciation problem. But it's not, because some words simply have different underlying representations in different dialects, and the system only works for dialects that are roughly the same except for a few sound changes. It fails for wildly or even mildly divergent dialects. The American Heritage Dictionary system sweeps this problem under the rug by saying "The pronunciations are exclusively those of educated speech", which, to my mind, is a cop-out, and not a satisfactory solution for Wikipedia.
How do you mean that morphophones fail for mildly divergent dialects? What is your reason for thinking such a thing? Surely not that the American Heritage Dictionary didn't take much effort? I already said that these dictionaries have unsophisticated systems. The AHD states its limitations: educated American speech only. This allows them to cut corners on their implementation.
However, I do not know of any system advocated by linguists other than what phonologists call "broad transcription" using IPA. Can you point me to a book or paper, written by linguists, that specifies such a system for English, and advocates its use by and for general (non-academic) readers?
I've cited the original 1962 paper introducing morphophones before; I'd have to look up the citation in the archives to repeat it, but you're already going through those so I'll refrain for now. But that was an academic paper; what I should do now is try to track down a more recent (1980s) book that I've read, written by linguists, which advocates its use outside academic settings.
I have never encoutered such a system, and I doubt that one exists. Barring the existence of a standard system, I don't really see that Wikipedia has any other options besides IPA for specifying pronunciations. Certainly I hope no one thinks Wikipedia should invent its own system. When it comes to standards, it should be our job to follow them and describe them, not create them.
I'm not sure to what extent there is a /single/ standard system. There certainly is at least one system in use by linguists. Probably with variations due to improved understanding over time, but whether these are coordinated by a single standards body I don't know. I will try to track this down too.
PS: I have made a page on meta called [[Pronunciations]] and am going through the list archives and posting links to relevant discussions there. I'm not sure what the policy should be regarding where further discussion should occur, so if you want to respond, do so either here or on the list.
OK, I'll watch it.
-- Toby
Toby Bartels wrote:
David Friedland wrote about morphophones:
And it seems like a panacea for the pronunciation problem. But it's not, because some words simply have different underlying representations in different dialects, and the system only works for dialects that are roughly the same except for a few sound changes. It fails for wildly or even mildly divergent dialects. The American Heritage Dictionary system sweeps this problem under the rug by saying "The pronunciations are exclusively those of educated speech", which, to my mind, is a cop-out, and not a satisfactory solution for Wikipedia.
How do you mean that morphophones fail for mildly divergent dialects? What is your reason for thinking such a thing? Surely not that the American Heritage Dictionary didn't take much effort? I already said that these dictionaries have unsophisticated systems. The AHD states its limitations: educated American speech only. This allows them to cut corners on their implementation.
The reasoning behind morphophones is that even though people speak with different regional dialects, how the pronunciations are stored in each person's internal lexicon in their brain is the same, or can be representented symbolically in ways that are equivalent. The morphophonic system taps into this internal consistency between different dialects and thus a single symbolic form can represent the different (but equivalent) pronunciations for speakers of different dialects.
For example, in such system we would have a single symbol for the sound represented by the final "er" in the word "runner". A speaker of a non-rhotic Boston dialect, for example, would then always produce this sound as a plain schwa, and a speaker of, say, standard American would produce it as a rhoticized schwa. In the morphophonic system, only a single pronunciation would be needeed to specify the two different pronunciations in result.
The problem with this system is that the fundamental assumption that internal representations of pronunciations are equivalent is false. This is what I meant by "mildly divergent" dialects. Besides regular sound change, dialects also differ in some cases in how pronunciations are represented in the lexicon. It is simply the case that some dialects have fundamentally different internal representations for the pronunciations of some words.
If you don't agree, then how would you specify a single pronunciation using a morphophonic system for the words "almond", "apricot", "aunt", "controversy", "clerk", "creek", "florida", "garage", "greasy", "lieutenant", "mayonnaise", "mischievous", "pecan", and "tour", just for starters? I just don't see how a simple system could capture all these variants with a single representation. You're not advocating a system that has a symbol that corresponds to /u/ in AmE and /Ef/ in BrE so that "lieutenant" is represented with one set of symbols, are you?
However, I do not know of any system advocated by linguists other than what phonologists call "broad transcription" using IPA. Can you point me to a book or paper, written by linguists, that specifies such a system for English, and advocates its use by and for general (non-academic) readers?
I've cited the original 1962 paper introducing morphophones before; I'd have to look up the citation in the archives to repeat it, but you're already going through those so I'll refrain for now. But that was an academic paper; what I should do now is try to track down a more recent (1980s) book that I've read, written by linguists, which advocates its use outside academic settings.
OK. I'd be really interested to learn how the above problem is solved.
I have never encoutered such a system, and I doubt that one exists. Barring the existence of a standard system, I don't really see that Wikipedia has any other options besides IPA for specifying pronunciations. Certainly I hope no one thinks Wikipedia should invent its own system. When it comes to standards, it should be our job to follow them and describe them, not create them.
I'm not sure to what extent there is a /single/ standard system. There certainly is at least one system in use by linguists. Probably with variations due to improved understanding over time, but whether these are coordinated by a single standards body I don't know. I will try to track this down too.
- David [[User:Nohat]]
--- David Friedland david@nohat.net wrote:
The reasoning behind morphophones is that even though people speak with different regional dialects, how the pronunciations are stored in each person's internal lexicon in their brain is the same, or can be representented symbolically in ways that are equivalent. The morphophonic system taps into this internal consistency between different dialects and thus a single symbolic form can represent the different (but equivalent) pronunciations for speakers of different dialects.
For example, in such system we would have a single symbol for the sound represented by the final "er" in the word "runner". A speaker of a non-rhotic Boston dialect, for example, would then always produce this sound as a plain schwa, and a speaker of, say, standard American would produce it as a rhoticized schwa. In the morphophonic system, only a single pronunciation would be needeed to specify the two different pronunciations in result.
The problem with this system is that the fundamental assumption that internal representations of pronunciations are equivalent is false. This is what I meant by "mildly divergent" dialects. Besides regular sound change, dialects also differ in some cases in how pronunciations are represented in the lexicon. It is simply the case that some dialects have fundamentally different internal representations for the pronunciations of some words.
If you don't agree, then how would you specify a single pronunciation using a morphophonic system for the words "almond", "apricot", "aunt", "controversy", "clerk", "creek", "florida", "garage", "greasy", "lieutenant", "mayonnaise", "mischievous", "pecan", and "tour", just for starters? I just don't see how a simple system could capture all these variants with a single representation. You're not advocating a system that has a symbol that corresponds to /u/ in AmE and /Ef/ in BrE so that "lieutenant" is represented with one set of symbols, are you?
- David [[User:Nohat]]
I'd advocate for such a system. I created a system that can do just that by writing (oo|ayf). If you wanted to do almond, you'd write a-|lmi|und. This can be made slightly less verbose by using accent marks. The other accents besides US and UK English can just infer what sound it will make. I think such a system (although not mine) would work well. I would like to know what linguists use, though. LDan
__________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com
BTW, a google search on morphophone yielded 63 results, and morphophonic yielded 2 wikipedia mail posts. However, morphophonemic had 3120 results, and apparently is very similar to what we're talking about. LDan
__________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com