Conference report from South Africa

List overview All Threads
Download

newer

older

Two cute press clippings

Wikipedia, Emergence, and The...

Angela

23 Apr 2005 23 Apr '05

8:47 p.m.

This is a brief report from the FLOSS Conference in South Africa that Erik and I attended this week. A more detailed version is on Meta at http://meta.wikimedia.org/wiki/Conference_reports/FLOSS%2C_South_Africa_2005; so please read that one instead if you have time.

I was invited to give a presentation about the Wikimedia projects at the international "Free/Libre and Open Source Software" (FLOSS) and Free Knowledge workshop in Pretoria, South Africa. Erik was given the opportunity to hold a workshop there about wiki technology. The byline for the conference was "Knowledge for all, Education for all", so the Wikimedia projects fitted in perfectly.

The first day was made up of formal presentations. A list of these is on Meta. My talk was part of a "Digital Commons" panel. Much of the second day was divided into two workshops, including Erik's. The theme of Free Knowledge Communities was discussed on day 3, and there were many areas in which Wikimedia projects could collaborate with existing initiatives, and new ideas for using Wikimedia content:

* Spoken Wikipedia by cell phone. Many areas of Africa have high cell phone coverage with access to SMS. Teemu Leinonen of the University of Art and Design Helsinki is working on a project to allow a user to send an SMS with the article title to a phone number. A few seconds later, they get a call on their cell phone with a (usually machine-generated) spoken version of the article they requested. * Wikipedia in schools. Static HTML dumps on DVD, offline applications that allow editing, and update feeds like rsync to maintain offline copies, were all requested by people working on getting Wikipedia into schools. Where people were interested in print projects, they wanted to focus on printing out particular topics, rather than having a copy of the entire encyclopedia. * Wiktionary. There is a need for a repository of legal terminology in the 11 official languages of South Africa since courts often rely on untrained interpreters who need a reference guide for dealing with unfamiliar terminology from any of the languages they were not native speakers of. * Wikibooks/Wikiversity/E-learning. With the price of textbooks much higher in South Africa than in developed countriesfree textbooks are of extreme importance, and Wikibooks could provide the content needed for initiatives to deliver this. We discussed our existing and potential future projects at length and talked to proponents of various e-learning initiatives.

Again, see http://meta.wikimedia.org/wiki/Conference_reports/FLOSS%2C_South_Africa_2005 for details on any of these issues.

==Meetup==

On the evening of the third day, the first African meetup as held in "Cafe 41". Four Wikipedians from South Africa participated: Laurens, Alias, Renier Maritz and Andy Rabagliati. Renier's wife also joined us, along with some people from the conference. We discussed ways to promote the Afrikaans Wikipedia, methods to distribute Wikipedia to Africa, localization of the interface, and possibilities for e-learning.

==Future conferences==

Several upcoming conferences were mentioned as being of possible interest to Wikimedia. Most notable of these are WSIS (http://www.itu.int/wsis/;), which I believe Jimmy and Yann may be attending, and the World Conference on Computers in Education (http://www.sbs.co.za/wcce2005/;), for which no Wikimedia attendance is currently planned.

Unfortunately, we did not see much of South Africa beyond the conference centre. Nevertheless, the visit was very productive and led to many new contacts and insights. We aim to follow up on the discussions, and turn some of the ideas above into reality soon.

Angela.

-- http://en.wikipedia.org/wiki/User:Angela

Show replies by date

Anthere

24 Apr 24 Apr

5:52 a.m.

Thanks a lot to both of you for the good report and spreading the word ;-)

Now, good luck with cleaning one week mail :-)

ant

Angela a écrit:

...

This is a brief report from the FLOSS Conference in South Africa that Erik and I attended this week. A more detailed version is on Meta at http://meta.wikimedia.org/wiki/Conference_reports/FLOSS%2C_South_Africa_2005; so please read that one instead if you have time.

Timwi

4:09 p.m.

Angela wrote:

...

* Spoken Wikipedia by cell phone. Many areas of Africa have high
cell phone coverage with access to SMS. Teemu Leinonen of the University of Art and Design Helsinki is working on a project to allow a user to send an SMS with the article title to a phone number. A few seconds later, they get a call on their cell phone with a (usually machine-generated) spoken version of the article they requested.

Heh! All the more incentive to get more people to participate in [[Wikipedia:WikiProject Spoken Wikipedia]] (shameless plug)! Nobody really wants machine-generated spoken versions when a real human-spoken version is available. :)

...

* Wikibooks/Wikiversity/E-learning. With the price of textbooks
much higher in South Africa than in developed countriesfree textbooks are of extreme importance, and Wikibooks could provide the content needed for initiatives to deliver this.

This is assuming that the larger part of the cost of textbooks is copyright licenses. I would imagine that it is instead the production costs of actual books, and obviously free content won't help that.

Timwi

Jimmy Wales

25 Apr 25 Apr

6:51 a.m.

Timwi wrote:

...

This is assuming that the larger part of the cost of textbooks is copyright licenses. I would imagine that it is instead the production costs of actual books, and obviously free content won't help that.

I agree that it would be very helpful to us to have a better understanding of the tradeoffs but I also wanted to point out that it's a bit more complex than just copyright license + cost of production here.

Our work is free-as-in-beer but also free-as-in-speech. So the point is not _just_ that a potential producer of paper texts saves on the cost of copyright licensing but also....

1. They don't need to get permission from anyone at all, they can just get started in any small and tiny way they see fit (or in any large and mass-produced way they see fit)

2. There can be many competitors in a market ecosystem of provision of content, rather than a single licensee attempting to capture some monopoly rents

--Jimbo

Ray Saintonge

11:21 a.m.

Jimmy Wales wrote:

...

Timwi wrote:

...
This is assuming that the larger part of the cost of textbooks is copyright licenses. I would imagine that it is instead the production costs of actual books, and obviously free content won't help that.

I agree that it would be very helpful to us to have a better understanding of the tradeoffs but I also wanted to point out that it's a bit more complex than just copyright license + cost of production here.

Our work is free-as-in-beer but also free-as-in-speech. So the point is not _just_ that a potential producer of paper texts saves on the cost of copyright licensing but also....

They don't need to get permission from anyone at all, they can just

get started in any small and tiny way they see fit (or in any large and mass-produced way they see fit)

There can be many competitors in a market ecosystem of provision of

content, rather than a single licensee attempting to capture some monopoly rents

I think that we are already well on the way to dealing with the copyright issues. The free-as-in-speech will probably need to be dealt with one jurisdiction at a time.

To-day's off-the-shelf Microsoft product does not come with the huge array of manuals that would have come with its ten-year old counterpart. Instead we need to make do with electronic files that are nowhere near as useful as a book that you can hold in your hand and leaf through. Reducing production costs has a clear effect on the Microsoft bottom line.

Whether we or a downstream user converts the material to a paper product the production costs will always be there. Simply clarifying permissions is still a long way from getting the material to those who need it most. Any producer will at least want to see its costs covered.

Andy Rabagliati

7:26 a.m.

On Sun, 24 Apr 2005, Timwi wrote:

...

Angela wrote:

...
* Spoken Wikipedia by cell phone.
Heh! All the more incentive to get more people to participate in [[Wikipedia:WikiProject Spoken Wikipedia]] (shameless plug)! Nobody really wants machine-generated spoken versions when a real human-spoken version is available. :)

There was some discussion of that. Two (very real) problems :-

* Editing. Voice editing sounds clumsy, and would sound like CamelCase :-)

* Accents. If an Indian is trying to understand what a Geordie or someone from Barbados is saying, it might as well be in Afrikaans :-)

Transcription, and machine-generated, please ?

...

This is assuming that the larger part of the cost of textbooks is copyright licenses. I would imagine that it is instead the production costs of actual books, and obviously free content won't help that.

Having had (almost) direct experience of this in Africa, much of the problem is bureaucracy and supply.

Cheers, Andy!

Ronald Chmara

7:46 a.m.

On Apr 25, 2005, at 5:26 AM, Andy Rabagliati wrote:

...

There was some discussion of that. Two (very real) problems :-

Editing. Voice editing sounds clumsy, and would sound like CamelCase

:-)

Accents. If an Indian is trying to understand what a Geordie or someone from Barbados is saying, it might as well be in Afrikaans :-)

Wikipedia is not paper. Or one tape deck.

Thus, one word, say "aluminium" could have several pronunciation entries.

-Bop

Gerard Meijssen

8:20 a.m.

Ronald Chmara wrote:

...

On Apr 25, 2005, at 5:26 AM, Andy Rabagliati wrote:

...
There was some discussion of that. Two (very real) problems :-

Editing. Voice editing sounds clumsy, and would sound like

CamelCase :-)

Accents. If an Indian is trying to understand what a Geordie or someone from Barbados is saying, it might as well be in Afrikaans :-)

Wikipedia is not paper. Or one tape deck.

Thus, one word, say "aluminium" could have several pronunciation entries.

-Bop

Hoi, For Wiktionary we want MANY pronunciations. The word aluminium can be found as http://commons.wikimedia.org/wiki/Image:Nl-aluminium.ogg on Commons. This is the Dutch pronunciation. There are more language that spell aluminium ( http://nl.wiktionary.org/wiki/aluminium ). Because of a naming convention, I do welcome all pronunciations for this word. It would take someone from Great Britain to do an en-en-aluminium.ogg file.. A soundfile with fr da no and sv would also be very much appreciated.

If we want to have the pronunciations of all words Geordie, we only need to establish a naming convention for it and stick to it. Nothing major, no rocket technology.. It is not difficult, we have a solution :)

Thanks, GerardM

Ray Saintonge

11:34 a.m.

Ronald Chmara wrote:

...

On Apr 25, 2005, at 5:26 AM, Andy Rabagliati wrote:

...
There was some discussion of that. Two (very real) problems :-

Editing. Voice editing sounds clumsy, and would sound like

CamelCase :-)

Accents. If an Indian is trying to understand what a Geordie or someone from Barbados is saying, it might as well be in Afrikaans :-)

Wikipedia is not paper. Or one tape deck.

Thus, one word, say "aluminium" could have several pronunciation entries.

That only helps the person using the term. It's a one-way feature. The person hearing the term used in another dialect will still have no idea what is being said. How can he transcribe anything which is unintelligible to him?

Mark Williamson

2:45 p.m.

I am guessing that at least a couple of those who attended the conference were native or fluent speakers of some of South Africa's _other_ official languages.

Do these people know that we have Wikipedias in their mother tongue? I somehow doubt it. If they do know, do they care? I know that there are some people who don't care about the Wikipedia in their mother tongue, but most people I have told said they didn't know and went to work on it right away.

It seems to me that African-language localisation is an important part of open-content and open-source in South Africa (see translate.co.za ), yet very few of these people seem to know about these Wikipedias.

In fact, I did a spot check, and it seems that many of the contributors to the Afrikaans Wikipedia are South Africans with Zulu, Xhosa, etc names and in some cases obviously speak one of these languages (User:Alias, for example, has contributed a little bit to the Zulu Wikipedia; others have heavily edited articles on what might be their mother tongue, or say it on their userpage). I suspect the reason for involvement with the Afrikaans Wikipedia only rather than their mother tongue(s) is that the Afrikaans Wikipedia is still relatively small, but I'm sure at least some of the people at the conference or the Afrikaans Wikipedia would edit these Wikipedias if they knwe bout them, but they don't.

Mark

On 25/04/05, Ray Saintonge saintonge@telus.net wrote:

...

Ronald Chmara wrote:

...
On Apr 25, 2005, at 5:26 AM, Andy Rabagliati wrote:

...
There was some discussion of that. Two (very real) problems :-

Editing. Voice editing sounds clumsy, and would sound like

CamelCase :-)

Accents. If an Indian is trying to understand what a Geordie or someone from Barbados is saying, it might as well be in Afrikaans :-)

Wikipedia is not paper. Or one tape deck.

Thus, one word, say "aluminium" could have several pronunciation entries.

That only helps the person using the term. It's a one-way feature. The person hearing the term used in another dialect will still have no idea what is being said. How can he transcribe anything which is unintelligible to him?

Ec

Wikipedia-l mailing list Wikipedia-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikipedia-l

-- SI HOC LEGERE SCIS NIMIVM ERVDITIONIS HABES QVANTVM MATERIAE MATERIETVR MARMOTA MONAX SI MARMOTA MONAX MATERIAM POSSIT MATERIARI ESTNE VOLVMEN IN TOGA AN SOLVM TIBI LIBET ME VIDERE

Andy Rabagliati

26 Apr 26 Apr

4:28 a.m.

On Mon, 25 Apr 2005, Mark Williamson wrote:

...

I am guessing that at least a couple of those who attended the conference were native or fluent speakers of some of South Africa's _other_ official languages.

Afrikaans only.

...

Do these people know that we have Wikipedias in their mother tongue?

Since Angela contacted them by posting on their Talk pages, yes.

...

It seems to me that African-language localisation is an important part of open-content and open-source in South Africa (see translate.co.za ), yet very few of these people seem to know about these Wikipedias.

Knowing a Xhosa man in the IT profession, I asked him about the KDE menus in Xhosa. He said he found them confusing - when programming, using Email, every day interactions, he uses English. It actually slows him down to flip to Xhosa.

My french is quite passable. After staying in France for a while, I find myself thinking in French. It becomes quicker than thinking in English and translating everything.

Not saying it isn't a worthy goal to do the translation, particularly for people who are just starting with computers.

Cheers, Andy!

Alphax

25 Apr 25 Apr

8:18 a.m.

Andy Rabagliati wrote:

...

There was some discussion of that. Two (very real) problems :-

Editing. Voice editing sounds clumsy, and would sound like CamelCase :-)

Accents. If an Indian is trying to understand what a Geordie or

someone from Barbados is saying, it might as well be in Afrikaans :-)

Transcription, and machine-generated, please ?

Which Apple voice would you like? Or would you prefer Microsoft Bob? :P

-- Alphax http://en.wikipedia.org/wiki/User:Alphax There are two kinds of people: those who say to God, 'Thy will be done,' and those to whom God says, 'All right, then, have it your way.' - C. S. Lewis

Timwi

3:43 p.m.

Andy Rabagliati wrote:

...

On Sun, 24 Apr 2005, Timwi wrote:

...
Heh! All the more incentive to get more people to participate in [[Wikipedia:WikiProject Spoken Wikipedia]] (shameless plug)! Nobody really wants machine-generated spoken versions when a real human-spoken version is available. :)

There was some discussion of that. Two (very real) problems :-

Editing. Voice editing sounds clumsy, and would sound like CamelCase :-)

Of course, you cannot edit a sound file in the same way that you can edit text. But you're not supposed to, anyway; the sound file is not an original article, but a reading of an existing textual version. My hope is that once most featured articles have a recording, the regular participants in the Spoken Wikipedia project will be happy to update their own sound files as the article changes significantly. If someone doesn't, well, then I guess someone else will have to re-read the entire article, but if someone's happy to do that (which isn't unlikely if the recording is significantly out of date) then there's no problem with that.

...

Accents. If an Indian is trying to understand what a Geordie or someone from Barbados is saying, it might as well be in Afrikaans :-)

I'm not sure how large and how representative a sample of listeners you have already surveyed, but I highly doubt this is a real problem. The recordings are obviously supposed to be spoken slowly and clearly.

Are you a native speaker of English? Where are you from? What accents do you tend to have trouble understanding?

Timwi

Mark Williamson

7:22 p.m.

Timwi, if you doubt the accent problem is a real one, you clearly have not heard many different accents in your life.

Even when spoken slowly and clearly, there are some accents that are well near unintelligible to those with a certain different accent, at least without being around them for a while.

And what about accents that some might deem "incorrect"? The typical Singaporean accent might be called incorrect by some, and those of "foreigners" would most likely be called incorrect by quite a few people.

Some people would say that any accent other than the "standard" (with English, being official in more than one nation, there is no single "standard", but people often being ignoramuses we can expect that they will say "Sure, the ____ and the _____ have their 'standard accent', but ours is the only correct one." (making little imaginary quote marks around "standard accent")

If you listen to a sampling of a wide range of accents in English, you will almost certainly find one that, even when spoken "slowly" and "clearly", you have a great difficulty understanding.

With some languages this is even worse.

Mark

On 25/04/05, Timwi timwi@gmx.net wrote:

...

Andy Rabagliati wrote:

...
On Sun, 24 Apr 2005, Timwi wrote:

...
Heh! All the more incentive to get more people to participate in [[Wikipedia:WikiProject Spoken Wikipedia]] (shameless plug)! Nobody really wants machine-generated spoken versions when a real human-spoken version is available. :)

There was some discussion of that. Two (very real) problems :-

Editing. Voice editing sounds clumsy, and would sound like CamelCase :-)

Of course, you cannot edit a sound file in the same way that you can edit text. But you're not supposed to, anyway; the sound file is not an original article, but a reading of an existing textual version. My hope is that once most featured articles have a recording, the regular participants in the Spoken Wikipedia project will be happy to update their own sound files as the article changes significantly. If someone doesn't, well, then I guess someone else will have to re-read the entire article, but if someone's happy to do that (which isn't unlikely if the recording is significantly out of date) then there's no problem with that.

...

Accents. If an Indian is trying to understand what a Geordie or someone from Barbados is saying, it might as well be in Afrikaans :-)

I'm not sure how large and how representative a sample of listeners you have already surveyed, but I highly doubt this is a real problem. The recordings are obviously supposed to be spoken slowly and clearly.

Are you a native speaker of English? Where are you from? What accents do you tend to have trouble understanding?

Timwi

Wikipedia-l mailing list Wikipedia-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikipedia-l

-- SI HOC LEGERE SCIS NIMIVM ERVDITIONIS HABES QVANTVM MATERIAE MATERIETVR MARMOTA MONAX SI MARMOTA MONAX MATERIAM POSSIT MATERIARI ESTNE VOLVMEN IN TOGA AN SOLVM TIBI LIBET ME VIDERE

Mark Williamson

7:23 p.m.

And I should add, why do we need a spoken Wikipedia? Wouldn't it be better to adapt a TTS engine to get articles from Wikipedia, and read them? Remember, TTS engines can be made for any language, and it's becoming increasingly easy to make them even if you're not an expert.

Mark

On 25/04/05, Mark Williamson node.ue@gmail.com wrote:

...

Timwi, if you doubt the accent problem is a real one, you clearly have not heard many different accents in your life.

Even when spoken slowly and clearly, there are some accents that are well near unintelligible to those with a certain different accent, at least without being around them for a while.

And what about accents that some might deem "incorrect"? The typical Singaporean accent might be called incorrect by some, and those of "foreigners" would most likely be called incorrect by quite a few people.

Some people would say that any accent other than the "standard" (with English, being official in more than one nation, there is no single "standard", but people often being ignoramuses we can expect that they will say "Sure, the ____ and the _____ have their 'standard accent', but ours is the only correct one." (making little imaginary quote marks around "standard accent")

If you listen to a sampling of a wide range of accents in English, you will almost certainly find one that, even when spoken "slowly" and "clearly", you have a great difficulty understanding.

With some languages this is even worse.

Mark

On 25/04/05, Timwi timwi@gmx.net wrote:

...
Andy Rabagliati wrote:

...
On Sun, 24 Apr 2005, Timwi wrote:

...
Heh! All the more incentive to get more people to participate in [[Wikipedia:WikiProject Spoken Wikipedia]] (shameless plug)! Nobody really wants machine-generated spoken versions when a real human-spoken version is available. :)

There was some discussion of that. Two (very real) problems :-

Editing. Voice editing sounds clumsy, and would sound like CamelCase :-)

Of course, you cannot edit a sound file in the same way that you can edit text. But you're not supposed to, anyway; the sound file is not an original article, but a reading of an existing textual version. My hope is that once most featured articles have a recording, the regular participants in the Spoken Wikipedia project will be happy to update their own sound files as the article changes significantly. If someone doesn't, well, then I guess someone else will have to re-read the entire article, but if someone's happy to do that (which isn't unlikely if the recording is significantly out of date) then there's no problem with that.

...

Accents. If an Indian is trying to understand what a Geordie or someone from Barbados is saying, it might as well be in Afrikaans :-)

I'm not sure how large and how representative a sample of listeners you have already surveyed, but I highly doubt this is a real problem. The recordings are obviously supposed to be spoken slowly and clearly.

Are you a native speaker of English? Where are you from? What accents do you tend to have trouble understanding?

Timwi

Wikipedia-l mailing list Wikipedia-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikipedia-l

-- SI HOC LEGERE SCIS NIMIVM ERVDITIONIS HABES QVANTVM MATERIAE MATERIETVR MARMOTA MONAX SI MARMOTA MONAX MATERIAM POSSIT MATERIARI ESTNE VOLVMEN IN TOGA AN SOLVM TIBI LIBET ME VIDERE

-- SI HOC LEGERE SCIS NIMIVM ERVDITIONIS HABES QVANTVM MATERIAE MATERIETVR MARMOTA MONAX SI MARMOTA MONAX MATERIAM POSSIT MATERIARI ESTNE VOLVMEN IN TOGA AN SOLVM TIBI LIBET ME VIDERE

Andrew Lih

10 p.m.

On 4/26/05, Mark Williamson node.ue@gmail.com wrote:

...

And I should add, why do we need a spoken Wikipedia? Wouldn't it be better to adapt a TTS engine to get articles from Wikipedia, and read them? Remember, TTS engines can be made for any language, and it's becoming increasingly easy to make them even if you're not an expert.

Mark, not sure if you're wondering about the whole idea of audible aritcles, or just the human-centered approach to it.

If you find a TTS system that is acceptably good (Festival, MBROLA, et al) then feel free to try it and post some samples at [[Wikipedia:WikiProject Spoken Wikipedia]]. But so far the TTS results I've heard have been quite unsatisfactory, and the "ear fatigue" experienced when listening to a herky jerky TTS system doesn't bode well for long articles. Wikipedia might be a great testbed for a TTS system. I'd like to see how it does on mixed language articles like [[Dim sum]].

-User:Fuzheado

Mark Williamson

11:06 p.m.

Those "herky-jerky" TTS systems are considered "low-end". They really do suck.

You can by "high end" TTS systems for a lot of money, and I think with a little work there could be an open source "high end" engine as well.

I think somebody experimented in using neural networking to improve TTS (to try to get it to mimic a speech sample of a human), and the result was reasonably natural sounding.

Mark

On 25/04/05, Andrew Lih andrew.lih@gmail.com wrote:

...

On 4/26/05, Mark Williamson node.ue@gmail.com wrote:

...
And I should add, why do we need a spoken Wikipedia? Wouldn't it be better to adapt a TTS engine to get articles from Wikipedia, and read them? Remember, TTS engines can be made for any language, and it's becoming increasingly easy to make them even if you're not an expert.

Mark, not sure if you're wondering about the whole idea of audible aritcles, or just the human-centered approach to it.

If you find a TTS system that is acceptably good (Festival, MBROLA, et al) then feel free to try it and post some samples at [[Wikipedia:WikiProject Spoken Wikipedia]]. But so far the TTS results I've heard have been quite unsatisfactory, and the "ear fatigue" experienced when listening to a herky jerky TTS system doesn't bode well for long articles. Wikipedia might be a great testbed for a TTS system. I'd like to see how it does on mixed language articles like [[Dim sum]].

-User:Fuzheado _______________________________________________ Wikipedia-l mailing list Wikipedia-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikipedia-l

-- SI HOC LEGERE SCIS NIMIVM ERVDITIONIS HABES QVANTVM MATERIAE MATERIETVR MARMOTA MONAX SI MARMOTA MONAX MATERIAM POSSIT MATERIARI ESTNE VOLVMEN IN TOGA AN SOLVM TIBI LIBET ME VIDERE

Timwi

26 Apr 26 Apr

10:01 a.m.

Mark Williamson wrote:

...

Timwi, if you doubt the accent problem is a real one, you clearly have not heard many different accents in your life.

Firstly, you are not exactly helping your credibility by asserting to know more about my own experiences than I do.

Secondly, what is stopping speakers of various accents from speaking various articles? Just because an article has already been spoken by a Brit, doesn't exactly make it impossible for a Singaporean to contribute *their* version too, if they think the Brit is oh so incomprehensible.

Lastly, you are *definitely* not helping your credibility by asserting that an automated text-to-speech system would universally be better received than natural recordings! I can easily understand a wide range of English accents when they are spoken slowly, but I am afraid that text-to-speech systems are just painful to listen to.

I therefore highly recommend to fall back to automated text-to-speech only when a natural recording is not available.

Alphax

10:29 a.m.

Timwi wrote:

...

Secondly, what is stopping speakers of various accents from speaking various articles? Just because an article has already been spoken by a Brit, doesn't exactly make it impossible for a Singaporean to contribute *their* version too, if they think the Brit is oh so incomprehensible.

Would you like to hear *me* speak something? Just because someone can write doesn't mean they have a particularly good speaking voice.

...

Lastly, you are *definitely* not helping your credibility by asserting that an automated text-to-speech system would universally be better received than natural recordings! I can easily understand a wide range of English accents when they are spoken slowly, but I am afraid that text-to-speech systems are just painful to listen to.

Yes, they're all disjointed American :)

Jimmy Wales

4 May 4 May

6:05 p.m.

Mark Williamson wrote:

...

Timwi, if you doubt the accent problem is a real one, you clearly have not heard many different accents in your life.

Even when spoken slowly and clearly, there are some accents that are well near unintelligible to those with a certain different accent, at least without being around them for a while.

At least for English, and at least in my own experience, the accent problem is not a very big deal, in the context of people speaking slowly and clearly, especially if they are careful to moderate their accent to some widely-held standard and avoid particularly local expressions and accents.

...

Some people would say that any accent other than the "standard" (with English, being official in more than one nation, there is no single "standard", but people often being ignoramuses we can expect that they will say "Sure, the ____ and the _____ have their 'standard accent', but ours is the only correct one." (making little imaginary quote marks around "standard accent")

Well, if people say things like that, then they are wrong. Simple. :-)

There's no one particular way of speaking English which is correct, but there are ways which are easier and harder for most people to understand. This presents an obstacle to spoken wikipedia, but not an insurmountable one.

I talk to a lot of people who speak English with different sorts of accents, and it is very seldom a barrier to communication at all. It might be a bit surprising to listen to an English article read by Anthere, who speaks with a distinct French accent, but she's still very easy to understand.

--Jimbo

Delirium

11:28 p.m.

Jimmy Wales wrote:

...

I talk to a lot of people who speak English with different sorts of accents, and it is very seldom a barrier to communication at all. It might be a bit surprising to listen to an English article read by Anthere, who speaks with a distinct French accent, but she's still very easy to understand.

This depends partly on where you're from, because if you change small bits, and then change small bits again, and so on, the overall difference gets quite large. Most people who speak English with a middle-of-the-road American or British accent are one degree away from most accents you'll encounter. A German who learns English, for example, almost always learns from a UK or US English role model---not from an Indian or Singaporean role model. So the Indian or Singaporean is two degrees of accents away from the German speaking English, while the American or Briton is only one degree away.

I know from personal experience that while most American students can understand nearly all the foreign-born professors I've had, albeit sometimes with some difficulty, many of the international students, even those who were born in English-speaking countries like India, have much more difficulty, especially with some of the European accents.

Although it smacks of a bit of accent imperialism, the most widely understood accents are probably some sort of middle-of-the-road American accent (i.e. not a strong New York, Southern, or Texas accent), and some sort of middle-of-the-road British accent (i.e. not Cockney). Of course, we could always provide multiple readings...

-Mark

David Gerard

5 May 5 May

3:53 a.m.

Delirium (delirium@hackish.org) [050505 14:26]:

...

So the Indian or Singaporean is two degrees of accents away from the German speaking English, while the American or Briton is only one degree away.

I remembe going to Glasgow and attempting to buy beer from a Swedish shop assistant who spoke in a Swedish/Glaswegian accent.

(Mind you, just trying to buy lunch required my American friend to interpret the waitress's Glaswegian to English for me. Of course, they all understood Australian, because every single person in Britain is required by law to watch 'Neighbours'.)

...

Although it smacks of a bit of accent imperialism, the most widely understood accents are probably some sort of middle-of-the-road American accent (i.e. not a strong New York, Southern, or Texas accent), and some sort of middle-of-the-road British accent (i.e. not Cockney). Of course, we could always provide multiple readings...

There shouldn't be anything in the way of multiple readings, unless people's egos get in the way.

- d.

Alphax

5:15 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

David Gerard wrote:

...

Delirium (delirium@hackish.org) [050505 14:26]:

...
So the Indian or Singaporean is two degrees of accents away from the German speaking English, while the American or Briton is only one degree away.

cf. [[English_language#Classification_and_related_languages]] for a good description of this. Here's the first few paragraphs:

...

English is the primary language in Australia (Australian English), the Bahamas, Barbados (Caribbean English), Bermuda, Dominica, Gibraltar, Grenada, Guyana, Jamaica (Jamaican English), New Zealand (New Zealand English), Antigua, St. Lucia, Saint Kitts and Nevis, Saint Vincent and the Grenadines, Trinidad and Tobago, the United Kingdom (British English) and the United States of America (American English).

English is also one of the primary languages of Belize (with Spanish), Canada (with French), India (with Hindi and 21 other state languages), Ireland (with Irish), Singapore (with Malay, Mandarin, Tamil and other Asian languages) and South Africa (along with Zulu, Xhosa, Afrikaans, and Northern Sotho).

In Hong Kong, English is an official language and is widely used in business activities. It is taught from kindergarten level, and is the medium of instruction for a few primary schools, many secondary schools and all universities. Substantial number of students acquire native-speaker level. It is so widely used and spoken that it is inadequate to say it is merely a second or foreign language.

My guess is that people from countries in the first paragraph (EN-N) could understand each other (EN-N) and English speakers from the other countries (EN-3); people from the countries in the second paragraph (EN-3) could understand the "native" speakers (EN-N), and EN-3 speakers.

EN-N and EN-3 speakers could probably understand EN-2 speakers (people who have learn English as a foreign language) but would need to be careful to make themselves understandable; EN-2 speakers may have some difficulty understanding each other.

...

I remembe going to Glasgow and attempting to buy beer from a Swedish shop assistant who spoke in a Swedish/Glaswegian accent.

Regional accents, especially heavy ones, can be difficult even for native speakers to understand.

...

(Mind you, just trying to buy lunch required my American friend to interpret the waitress's Glaswegian to English for me.

It's well known in the non-American English-speaking world that American's don't actually speak English, they speak American :)

...

Of course, they all understood Australian, because every single person in Britain is required by law to watch 'Neighbours'.)

I feel sorry for them :p

- -- Alphax GnuPG key: 0xF874C613 - http://tinyurl.com/8mpg9 http://en.wikipedia.org/wiki/User:Alphax There are two kinds of people: those who say to God, 'Thy will be done,' and those to whom God says, 'All right, then, have it your way.' - C. S. Lewis

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (MingW32) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFCefJO/RxM5Ph0xhMRAtcYAJ0TXzo9mLJ9lIp9KviyMJWi57TcoQCgoIl+ oFcgQeqIBsIMnYG15ou4YyM= =vZKN -----END PGP SIGNATURE-----

David Gerard

2:07 p.m.

Alphax (alphasigmax@gmail.com) [050505 20:16]:

...

David Gerard wrote:

...

My guess is that people from countries in the first paragraph (EN-N) could understand each other (EN-N) and English speakers from the other countries (EN-3); people from the countries in the second paragraph (EN-3) could understand the "native" speakers (EN-N), and EN-3 speakers.

...

...
(Mind you, just trying to buy lunch required my American friend to interpret the waitress's Glaswegian to English for me.

Well, no, that's an example where this native speaker couldn't understand that native speaker ;-) More recently, a friend from Dublin, Ireland visited. The Dublin accent isn't outrageous, but we had a lot of trouble understanding each other and both had to consciously speak very clearly.

...

...
Of course, they all understood Australian, because every single person in Britain is required by law to watch 'Neighbours'.)

...

I feel sorry for them :p

There's an election today, perhaps the winning party will declare it cruel and unusual culture.

- d.

Andy Rabagliati

26 Apr 26 Apr

4:37 a.m.

On Mon, 25 Apr 2005, Timwi wrote:

...

Andy Rabagliati wrote:

...
There was some discussion of that. Two (very real) problems :-

Editing. Voice editing sounds clumsy, and would sound like CamelCase :-)

the sound file is not an original article, but a reading of an existing textual version.

At the conference, we discussed original content contribution by phone. This is wikipedia-by-cellphone - you have to be able to contribute, or it is not a wiki !

...

...

Accents. If an Indian is trying to understand what a Geordie or someone from Barbados is saying, it might as well be in Afrikaans :-)

I'm not sure how large and how representative a sample of listeners you have already surveyed, but I highly doubt this is a real problem. recordings are obviously supposed to be spoken slowly and clearly.

Are you a native speaker of English? Where are you from? What accents do you tend to have trouble understanding?

Yes, I am a native speaker, in a plummy version of the Queens English. I am also well travelled.

I assure you, this problem is very real. Have you travelled to the North of England ? You may be suprised - sometimes you will have /absolutely no idea/ what they are talking about.

I lived for 10 years in the USA. Do you think an Indian would have any idea what a native of Brooklyn was talking about ? English might be the mother tongue for both of them - but Churchill famously said that America and England were separated by a common language.

I have had to ask someone from Huntsville, Alabama, to repeat themselves three times - and they were only spelling their name.

Cheers, Andy!

Tony Sidaway

5:01 a.m.

Andy Rabagliati said:

...

I assure you, this problem is very real. Have you travelled to the North of England ? You may be suprised - sometimes you will have /absolutely no idea/ what they are talking about.

The dialects of English are deeply entrenched; in other nations dialects as diverse as those of Northumberland, Durham and Wearside might each be classed as separate languages. As a child of five, I recall that a journey of three miles to school took me to the neighborhood of South Shields, where the vernacular had its own words and pronunciations quite distinct from those of the Whitburn area which tended towards the Sunderland dialect. I was literally a foreigner.

Jimmy Wales

4 May 4 May

6:15 p.m.

Andy Rabagliati wrote:

...

I assure you, this problem is very real. Have you travelled to the North of England ? You may be suprised - sometimes you will have /absolutely no idea/ what they are talking about.

Although I'm on record (2 minutes ago) as saying I don't think that the accent problem is a very big deal, I should also add that when I was consulting at the BBC (with Angela), I often found myself in meetings struggling to catch up if people started speaking very fast. Angela thought this was funny.

And of course the Brits have their own secret (ha ha) words for many ordinary things, thus leaving outsiders quite perplexed at times.

...

I lived for 10 years in the USA. Do you think an Indian would have any idea what a native of Brooklyn was talking about ?

This point is well taken, but in general both the Indian and the Brooklyn native would know how to slow down their speech and switch as well as they could to a more universally recognized accent in order to make themselves understood to each other.

...

I have had to ask someone from Huntsville, Alabama, to repeat themselves three times - and they were only spelling their name.

Once Angela said to me something like that we should meet at "half Eleven". When I didn't know what this meant, I think she thought I must be from another planet, and of course a way I am, since I'm from Huntsville, Alabama. (Where else could I get such a ridiculous nickname as Jimbo?)

:-)

But as a native of Huntsville, Alabama, but one who is educated and watched too much television as a child ;-), I think you'd find it very very easy to understand me, and I'd find it very very easy to understand you. (Although I might get frustrated if you didn't show up at 5:30 when you clearly said you'd be there at half-Eleven ;-))

--Jimbo

Timwi

26 Apr 26 Apr

10:08 a.m.

Timwi wrote:

...

I'm not sure how large and how representative a sample of listeners you have already surveyed, but I highly doubt this is a real problem. The recordings are obviously supposed to be spoken slowly and clearly.

I guess I shouldn't have said this, because now everyone thinks that arguing that accents can be mutually unintelligible, is an argument against natural recordings and for text-to-speech synthesis.

Andy Rabagliati

27 Apr 27 Apr

5:26 a.m.

New subject: Text to speech (Was: Conference report from South Africa)

On Tue, 26 Apr 2005, Timwi wrote:

...

I guess I shouldn't have said this, because now everyone thinks that arguing that accents can be mutually unintelligible, is an argument against natural recordings and for text-to-speech synthesis.

I am glad you brought it up, as I believe it needs to be addressed.

When Atlanta Airport (Georgia) opened, with automated trains taking you out to the terminals, there was a requirement for public announcements telling you which terminal, stand clear, departing now, etc.

They started out with a pleasant local (Georgian) accent.

Complaints - sounded too provincial, International passengers, etc.

They changed it to a female announcer.

Complaints - people didn't pay enough attention, what was wrong with the previous one, etc.

They have ended up with an assertive, metallic, computer voice.

No complaints.

Things pro text to speech :-

* Immediately works with all Wikipedia content. * No problems with editing. * Uniformity, even if it is uniformly poor ..

I will leave the pro natural speech argument to others - after preparing my flame-proof underwear :-) I would also welcome input from people for whom English is not a first language.

Cheers, Andy!

David 'DJ' Hedley

10:22 a.m.

New subject: Text to speech (Was: Conference report from SouthAfrica)

I agree bar the major con of name pronunciation. Even taking your own surname - Rabagliati; how many computers could pronounce that correctly? Also, implementing automatic audio conversion would presumably slow down Wikipedia considerably. ----- Original Message ----- From: "Andy Rabagliati" andyr@wizzy.com To: wikipedia-l@wikipedia.org Sent: Wednesday, April 27, 2005 11:26 AM Subject: [Wikipedia-l] Text to speech (Was: Conference report from SouthAfrica)

...

On Tue, 26 Apr 2005, Timwi wrote:

...
I guess I shouldn't have said this, because now everyone thinks that arguing that accents can be mutually unintelligible, is an argument against natural recordings and for text-to-speech synthesis.

I am glad you brought it up, as I believe it needs to be addressed.

When Atlanta Airport (Georgia) opened, with automated trains taking you out to the terminals, there was a requirement for public announcements telling you which terminal, stand clear, departing now, etc.

They started out with a pleasant local (Georgian) accent.

Complaints - sounded too provincial, International passengers, etc.

They changed it to a female announcer.

Complaints - people didn't pay enough attention, what was wrong with the previous one, etc.

They have ended up with an assertive, metallic, computer voice.

No complaints.

Things pro text to speech :-

Immediately works with all Wikipedia content.

No problems with editing.

Uniformity, even if it is uniformly poor ..

I will leave the pro natural speech argument to others - after preparing my flame-proof underwear :-) I would also welcome input from people for whom English is not a first language.

Cheers, Andy! _______________________________________________ Wikipedia-l mailing list Wikipedia-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikipedia-l

Ray Saintonge

12:22 p.m.

New subject: Text to speech (Was: Conference report from SouthAfrica)

David 'DJ' Hedley wrote:

...

I agree bar the major con of name pronunciation. Even taking your own surname - Rabagliati; how many computers could pronounce that correctly? Also, implementing automatic audio conversion would presumably slow down Wikipedia considerably.

When it comes to that surname the Olympic gold medalist with that name made us aware that it is pronounced as though the "ag" were not in the name.

Andy Rabagliati

4:57 p.m.

New subject: Text to speech (Was: Conference report from SouthAfrica)

On Wed, 27 Apr 2005, Ray Saintonge wrote:

...

David 'DJ' Hedley wrote:

...
I agree bar the major con of name pronunciation. Even taking your own surname - Rabagliati; how many computers could pronounce that correctly? Also, implementing automatic audio conversion would presumably slow down Wikipedia considerably.

When it comes to that surname the Olympic gold medalist with that name made us aware that it is pronounced as though the "ag" were not in the name.

Whew - people that know more than me ..

After moving to Scotland (hastily) over 100 years ago we have put the a back in, but the g is still gone.

Which indeed proves your point regarding voice recordings.

Wonderful extra info, but voice deployment needs orders of magnitude more work than text to speech, which surely will improve.

So pragmatism would choose text to speech as well.

Cheers, Andy! [few people in meatspace know my last name]

Timwi

28 Apr 28 Apr

6:25 p.m.

New subject: Text to speech (Was: Conference report from SouthAfrica)

Andy Rabagliati wrote:

...

On Wed, 27 Apr 2005, Ray Saintonge wrote:

...
David 'DJ' Hedley wrote:

When it comes to that surname the Olympic gold medalist with that name made us aware that it is pronounced as though the "ag" were not in the name.

After moving to Scotland (hastily) over 100 years ago we have put the a back in, but the g is still gone.

May I assume that your surname is Italian? In that case, I feel I must correct you. The g is not "gone" (by which I assume you mean "silent", i.e. not pronounced); rather, the letter combination "gl" stands for a distinct sound (similar to how "sh" in English stands for a sound that is different from both "s" and "h"). Unfortunately, the "gl" sound does not occur in English (or indeed many other European languages), and to most people it just sounds like an l.

Timwi

Magnus Manske

4:36 a.m.

New subject: Text to speech (Was: Conference report from South Africa)

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

At http://magnusmanske.de/wikipedia/speak.html you can hit a button to have "This is a wiki test" read to you as a WAV. It is using the service at http://www.festvox.org/voicedemos.html based on the open source Festival speech generation engine.

The HTML is dead simple. The text of a wikipedia page could be generated on-the-fly within the parser, or from the <body> part of the output.

Or we could have a <speech>special extension</speech> for important parts. Then again, probably not.

We should, however, ask the people from CMU first before we wikipedia-flood their demo system :-)

Magnus

Andy Rabagliati schrieb:

...

On Tue, 26 Apr 2005, Timwi wrote:

...
I guess I shouldn't have said this, because now everyone thinks that arguing that accents can be mutually unintelligible, is an argument against natural recordings and for text-to-speech synthesis.

I am glad you brought it up, as I believe it needs to be addressed.

When Atlanta Airport (Georgia) opened, with automated trains taking you out to the terminals, there was a requirement for public announcements telling you which terminal, stand clear, departing now, etc.

They started out with a pleasant local (Georgian) accent.

Complaints - sounded too provincial, International passengers, etc.

They changed it to a female announcer.

Complaints - people didn't pay enough attention, what was wrong with the previous one, etc.

They have ended up with an assertive, metallic, computer voice.

No complaints.

Things pro text to speech :-

Immediately works with all Wikipedia content.

No problems with editing.

Uniformity, even if it is uniformly poor ..

I will leave the pro natural speech argument to others - after preparing my flame-proof underwear :-) I would also welcome input from people for whom English is not a first language.

Cheers, Andy! _______________________________________________ Wikipedia-l mailing list Wikipedia-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikipedia-l

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (MingW32) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFCcK6FCZKBJbEFcz0RAjiAAJ42ifnHEfBYxIgkWxF1E7FS6+q02QCeO22t pVmNjwSHavG8wJepQRtxxTU= =WeJh -----END PGP SIGNATURE-----

Jimmy Wales

4 May 4 May

6:17 p.m.

New subject: Text to speech (Was: Conference report from South Africa)

Andy Rabagliati wrote:

...

When Atlanta Airport (Georgia) opened, with automated trains taking you out to the terminals, there was a requirement for public announcements telling you which terminal, stand clear, departing now, etc.

They started out with a pleasant local (Georgian) accent.

Complaints - sounded too provincial, International passengers, etc.

They changed it to a female announcer.

Complaints - people didn't pay enough attention, what was wrong with the previous one, etc.

They have ended up with an assertive, metallic, computer voice.

No complaints.

That's a great story, but it occurs to me that maybe the reason there were no complaints is that people just assume that computers speak badly, and nothing can be done about it, and so they just accept it, even if it is annoying.

--Jimbo

Jimmy Wales

4:54 p.m.

Andy Rabagliati wrote:

...

...
This is assuming that the larger part of the cost of textbooks is copyright licenses. I would imagine that it is instead the production costs of actual books, and obviously free content won't help that.

Having had (almost) direct experience of this in Africa, much of the problem is bureaucracy and supply.

Surely free licensing will help with both bureaucracy and supply. Anyone who takes an interest and sees an opportunity can redistribute our work without having to get permission from anyone.

--Jimbo

7183

Age (days ago)

7194

Last active (days ago)

wikipedia-l@lists.wikimedia.org

35 comments

16 participants

tags (0)

participants (16)

Alphax
Andrew Lih
Andy Rabagliati
Angela
Anthere
David 'DJ' Hedley
David Gerard
Delirium
Gerard Meijssen
Jimmy Wales
Magnus Manske
Mark Williamson
Ray Saintonge
Ronald Chmara
Timwi
Tony Sidaway