Dear Wikimedians,
Some of you might be recovering from the Wikimania fatigue. Those of you who have already recovered, I wanted to pick your brain about something that came up multiple times during discussions but none really seem to have a clear answer.
Which script (writing system) an oral language speaker would use for creating an entry on (gateway [1]) projects like Wiktionary or Wikibooks or even uploading a list of words on Commons using a tool like Lingua Libre? Will it be the script used for the official language of the region where the former language is from?[2] This is a bit controversial as native speakers of many indigenous languages would see this as a form of colonization. Will it be the w:International Phonetic Alphabet (IPA)? This is probably the least controversial but a common and average user might not be able to read IPA as the latter was created by linguists and was created for linguistic and scholarly studies rather than for everyday use.
Wikimedians who are native speakers of languages with less written/recorded documentation and individuals who work on such languages are more encouraged to share their inputs based on past experience.
1. Gateway project: This is a made-up term to define the Wikimedia projects that are more welcoming to newbies and do not require stringent citation as almost all oral languages would lack that. It was fascinating to see Amir challenging that it only takes about 30 seconds to add an entry to Wiktionary ( https://commons.wikimedia.org/wiki/File:Amir_Aharoni_demonstrating_how_to_ad... )
Subhashish
Hi Subhashish,
This is a really hard question, and I don't think there's a good, specific general answer. A non-specific general answer is that you have to find something that works reasonably well, takes advantage of any pre-existing resources, is as easy and practical to use as possible, and that people are willing to use. That almost sounds like a description of the problem, more than an answer.
IPA might not actually be a good answer, depending on the variety of dialects and the phonology of the language in question. For example, some speakers of Hawaiian pronounce w more or less like English w, others pronounce it like English v. Do you change the writing based on who's talking? Another example is English t—for many speakers, the t in "top" is different than the t in "stop" and both are different from the t in "bottle"... but in the mind of an English speaker, they are all t. IPA can be *too* detailed.
You can also lose relationships between words if you spell only according to sound (again, depending on the language). English "photograph" and "photography" might be spelled more phonetically as "fotəgræf" and "fətagrəfi", but that decreases the obviousness of the relation between them. Is it worth it? It's a judgement call.
As for using the script of a/the dominant language of the region where the language is found, it depends on a lot of linguistic and cultural factors. Is the relationship between the language groups neutral or positive? Are people already generally literate in the writing system of the dominant language? Are the languages fairly closely related? Yes to any of those makes the writing system of the dominant language a better choice—though it still may not be a *good* choice if the answer to any of them is no.
Are there closely related languages (linguistically, not necessarily geographically) that already have a well-designed writing system that could be borrowed and adapted? For example, a lot of the Turkic languages have adopted fairly similar versions of the Latin alphabet. That way some problems only have to be solved once, and it can also be easier to read a closely related language.
Another technological issue—are input devices (keyboards—either physical or virtual, like on a phone) that cover all the needed symbols readily available? (That's another reason not to use IPA—I think it's actually not too hard to learn the subset relevant to a language you speak, but it is often *really* hard to type.)
Ideally, a writing system should be devised over some time, with heavy input from speakers of the language and guidance from linguists who have experience with related languages, the development (historical or practical) of writing systems, or both if available.
A gateway wiki might be a good place to experiment with a new writing system, but it could also end up generating too much inertia if good changes are proposed that require re-writing almost everything. I really don't know.
It's an interesting and difficult question you have. I hope it generates some fruitful discussion.
—Trey
PS: Also read up on spelling reform in various languages for related ideas and possible problems.
Trey Jones Sr. Software Engineer, Search Platform Wikimedia Foundation UTC-4 / EDT
On Thu, Aug 22, 2019 at 3:26 PM Subhashish Panigrahi psubhashish@gmail.com wrote:
Dear Wikimedians,
Some of you might be recovering from the Wikimania fatigue. Those of you who have already recovered, I wanted to pick your brain about something that came up multiple times during discussions but none really seem to have a clear answer.
Which script (writing system) an oral language speaker would use for creating an entry on (gateway [1]) projects like Wiktionary or Wikibooks or even uploading a list of words on Commons using a tool like Lingua Libre? Will it be the script used for the official language of the region where the former language is from?[2] This is a bit controversial as native speakers of many indigenous languages would see this as a form of colonization. Will it be the w:International Phonetic Alphabet (IPA)? This is probably the least controversial but a common and average user might not be able to read IPA as the latter was created by linguists and was created for linguistic and scholarly studies rather than for everyday use.
Wikimedians who are native speakers of languages with less written/recorded documentation and individuals who work on such languages are more encouraged to share their inputs based on past experience.
- Gateway project: This is a made-up term to define the Wikimedia
projects that are more welcoming to newbies and do not require stringent citation as almost all oral languages would lack that. It was fascinating to see Amir challenging that it only takes about 30 seconds to add an entry to Wiktionary (
https://commons.wikimedia.org/wiki/File:Amir_Aharoni_demonstrating_how_to_ad... )
Subhashish _______________________________________________ Languages mailing list Languages@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/languages
Subhashish Panigrahi, 22/08/19 22:25:
Which script (writing system) an oral language speaker would use for creating an entry on (gateway [1]) projects like Wiktionary or Wikibooks
I understand the discussion on oral sources and on languages with little literature and a possibly under-defined orthography, but since when did this conversation shift to a focus on languages which do not even have an established writing system?
Out of 7000 or so languages in ISO 639-3, how many lack a recognised writing system?
Federico
From Ethnologue
https://www.ethnologue.com/enterprise-faq/how-many-languages-world-are-unwritten-0 :
The exact number of unwritten languages is hard to determine. Ethnologue (21st edition) has data to indicate that of the currently listed 7,111 living languages, 3,995 have a developed writing system. We don't always know, however, if the existing writing systems are widely used. That is, while an alphabet may exist there may not be very many people who are literate and actually using the alphabet. The remaining 3,116 are likely unwritten.
On Mon, Aug 26, 2019 at 3:48 AM Federico Leva (Nemo) nemowiki@gmail.com wrote:
Subhashish Panigrahi, 22/08/19 22:25:
Which script (writing system) an oral language speaker would use for creating an entry on (gateway [1]) projects like Wiktionary or Wikibooks
I understand the discussion on oral sources and on languages with little literature and a possibly under-defined orthography, but since when did this conversation shift to a focus on languages which do not even have an established writing system?
Out of 7000 or so languages in ISO 639-3, how many lack a recognised writing system?
Federico
Languages mailing list Languages@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/languages
Trey Jones, 26/08/19 16:28:
From Ethnologue https://www.ethnologue.com/enterprise-faq/how-many-languages-world-are-unwritten-0:
The exact number of unwritten languages is hard to determine. Ethnologue (21st edition) has data to indicate that of the currently listed 7,111 living languages, 3,995 have a developed writing system. We don't always know, however, if the existing writing systems are widely used.
Indeed, so my question is really whether whoever proposes this focus on languages without an established writing system 1) agrees with the figures above, 2) (if yes) thinks we should prioritise those 3k or so languages over the other 4k or so languages, of which we "only" have about 400 in MediaWiki.
Federico
There's no magic solution. Just work, hard, on the ground work.
For some languages, industrious people from the ethnic group that speaks that language made their own writing systems, which were either partly based on existing foreign systems or created from scratch. Examples from the last couple of centuries include Cherokee, N'Ko, Santali, Vai, and Ho, and there are others. From what I've read about them, they were created by self-taught people who managed to figure out the phonetics of their own languages with little or no formal training in European-style academic linguistics. The creator of the N'Ko writing system Solomana Kante was subsequently praised by European academics as someone who managed to describe the phonetics of the different regional varieties of his language with a well-matching unified writing system, and there are similar evaluations of the other people who created the alphabets I mentioned above.
For many other languages, the writing systems were created by foreign religious missionaries or political functionaries, who also happened to have some understanding of language. It worked better in some cases, and less well in others. When I say "better", I mean that the people who actually speak the language managed to learn it and establish the use of that writing system for elementary literacy education, recording ancestral stories and local knowledge, publishing newspapers and books, personal writing (emails, shopping lists, greeting cards), government and business, and so on. When I say "less well", I mean that little was produced in that writing system other than a translation of the Bible or the Quran.
What should be done? A brand new writing system, or an orthography that is based on an existing one? There's no one answer. Using a Latin-based alphabet has obvious advantages: it's available in computer keyboards and printing houses everywhere, and a lot of people are familiar with it. But for some languages other alphabets worked better for establishing schools, so it doesn't have to be the end-all, only option. The only real answer is "whatever works". It's a very generic and circular answer, but that's just how it is. Different things worked for different languages in history.
I am not opposed in principle to the hosting on Wikimedia sites of content in languages that have a completely new writing system, whether based on an existing writing system (such as Latin, Cyrillic, Arabic, or Devanagari) or a brand new one. There are some practical considerations with this however: 1. If it's a brand new system, which is not in Unicode yet, it will be technically difficult. 2. If a Wikimedia project is the first place whether a new orthography is used, this may be going against the existing Language committee's principle of "not creating new linguistic entities". I am a member or the committee, and I support this principle. However, I'm willing to be flexible whenever the people involved somehow prove that they are qualified and sincere. (I am writing this only on behalf of myself and not the whole committee. Other Langcom members may have a different opinion.) 3. On which project would such content go? Definitely not Wikipedia in any language. Wikisource may work, although Wikisource till now has been a place for hosting already-published works. Perhaps for new languages Wikisource could become more flexible, or a brand new wiki project could be created.
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore
בתאריך יום ה׳, 22 באוג׳ 2019 ב-22:26 מאת Subhashish Panigrahi < psubhashish@gmail.com>:
Dear Wikimedians,
Some of you might be recovering from the Wikimania fatigue. Those of you who have already recovered, I wanted to pick your brain about something that came up multiple times during discussions but none really seem to have a clear answer.
Which script (writing system) an oral language speaker would use for creating an entry on (gateway [1]) projects like Wiktionary or Wikibooks or even uploading a list of words on Commons using a tool like Lingua Libre? Will it be the script used for the official language of the region where the former language is from?[2] This is a bit controversial as native speakers of many indigenous languages would see this as a form of colonization. Will it be the w:International Phonetic Alphabet (IPA)? This is probably the least controversial but a common and average user might not be able to read IPA as the latter was created by linguists and was created for linguistic and scholarly studies rather than for everyday use.
Wikimedians who are native speakers of languages with less written/recorded documentation and individuals who work on such languages are more encouraged to share their inputs based on past experience.
- Gateway project: This is a made-up term to define the Wikimedia
projects that are more welcoming to newbies and do not require stringent citation as almost all oral languages would lack that. It was fascinating to see Amir challenging that it only takes about 30 seconds to add an entry to Wiktionary (
https://commons.wikimedia.org/wiki/File:Amir_Aharoni_demonstrating_how_to_ad... )
Subhashish _______________________________________________ Languages mailing list Languages@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/languages