Wiktionary-l September 2005

wiktionary-l@lists.wikimedia.org

10 participants
18 discussions

Re: [Wiktionary-l] English orthographies
by Gerard Meijssen 20 Sep '05

20 Sep '05

Jack & Naree wrote: >>From my point of view as a native English speaker who lives in England (not >in the American sense of "UK"!) >I think it's important that when I type in words like: > "colour" > That I get "colour" and not "color". I don't see why I need to know the >spelling of the word in what to me is a foreign language when I'm looking up >a word in a dictionary of my language. > --- > To GerardM > Basically there are two orthographies for English. > Some might argue the toss about that, I mean for myself as a "Scots" >speaker, I know there are some who make a big deal about it being a separate >language, but I myself don't know how to spell it properly, and I just think >of it as a regional dialect of English like Scouse, Yorkshire, Texan, Kiwi >and whatever else - because I would read normal English and pronounce it the >same way as Scots - the odd word like "leid" to me is no different than a >Yorkshireman saying "summat" for "something" or someone from the southeast >USA saying "y'all" - they're just dialect words. >I can show you texts written in a Yorkshire orthography, but the practical >fact is that the overwhelming majority of the text in the modern world is >either spelt the English way, or the American way. > The debate is "huge" in terms of it's implications, because up until now >no-one appears to have challenged the idea that American-English has the >right to be considered the standard form of English. It's patently obvious >it's a dialect, with it's own orthography and it's simply wrong for the >headword in English to be written in a dialect of English in a dialectal >orthography and presented as the standard form, when it's not. > > Hoi, I would not consider either variation of English to be more or less important/relevant. What I consider is practical; how does it impact including this content in Ultimate Wiktionary.. Here we have a need to identify a word as either EE or AE or ?E and the question is how to do this. It is up to the Wiktionary comunity how they want to have this. They can either have it with descriptions in definitions and etymologies spelled in one of the used orthographies or it can be considered not to be too important and it can be either. Thanks, GerardM

1 0

Re: [Wiktionary-l] Re: [Wikipedia-l] English orthographies
by Jim Breen 20 Sep '05

20 Sep '05

[Gerard Meijssen ([Wiktionary-l] Re: [Wikipedia-l] English orthographies) writes:] >> How many different orthographies / dialects are there for English. It all depends on definition. I say there is one "orthography" for English (the A-Z "Latin" alphabet). Perhaps you are confusing "orthography" and spelling. As for dialects, well many people who work in dialects do not regard "British English" and "American English" as distinct dialects because they are very similar (when compared with real dialects.) >> I >> would not dare to presume that a word is truly shared between the less >> well known versions of English. There are also the "true" dialects like >> Geordie that have to be considered. Creating check boxes assumes in a >> way that the editors /know/ these different versions of English well >> enough. Technically it can be done, but the spelling of the text in a >> meaning, an etymology needs to be adapted anyway. You have to realise >> that certain meanings do not travel well. It is therefore not only the >> orthography but also the Meanings of a word that needs to be considered. Yes, I suspect you have conflated "orthography" and "spelling". Jim -- Jim Breen http://www.csse.monash.edu.au/~jwb/ Clayton School of Information Technology, Tel: +61 3 9905 9554 Monash University, VIC 3800, Australia Fax: +61 3 9905 5146 (Monash Provider No. 00008C) ジム・ブリーン@モナシュ大学

2 1

Re: [Wiktionary-l] English orthographies
by Jim Breen 20 Sep '05

20 Sep '05

[Gerard Meijssen ([Wiktionary-l] English orthographies) writes:] >> >> 1) English, American English and other orthographies are treated as >> seperate entities. I think this will be a disaster. Can you explain why "jewellery" and "jewelry" cannot be alternatives within the one entry? >> This means that all words need to exist for each >> orthography/dialect. On the plus side it means that descriptions like >> etymology and meaning will be in this one orthography as well. This is >> also the most easy method to provide information for a spell checker. In what way is this easier "to provide information for a spell checker" than having spelling variants with an entry? >> 2) We treat these variants as belonging to a specific "spelling >> authority". I wonder what a "spelling authority" would be for English. >> ..... It does however provide us with the possibility to be more >> precise in what makes English different from American, Australian etc. It's news to me that "English [is] different from American, Australian,.." The versions of English used in the UK, USA, Australia, Canada, South Africa, etc. do vary, although not to a significant extent. The spellings also differ for a very small percentage of words, and meanings differ slightly too, although this happens *within* the UK, USA, Australia, etc. too. Only a small part of the regional variation in English can be reflected in a dictionary. Much of it is grammar, and choice of words. Jim -- Jim Breen http://www.csse.monash.edu.au/~jwb/ Clayton School of Information Technology, Tel: +61 3 9905 9554 Monash University, VIC 3800, Australia Fax: +61 3 9905 5146 (Monash Provider No. 00008C) ジム・ブリーン@モナシュ大学

2 1

English orthographies
by Gerard Meijssen 19 Sep '05

19 Sep '05

Hoi, To explain to the people on the Wiktionary mailinglist where this comes from, there is a huge debate on the Wikipedia-l mailinglist about having a seperate English and American English wikipedia. In the plans for Ultimate Wiktionary there are three ways in which words can be destinguished as being of a particular orthography. I will describe these here and hope to use the energy of this discussion for this question that needs a resolution at some stage. 1) English, American English and other orthographies are treated as seperate entities. This means that all words need to exist for each orthography/dialect. On the plus side it means that descriptions like etymology and meaning will be in this one orthography as well. This is also the most easy method to provide information for a spell checker. 2) We treat these variants as belonging to a specific "spelling authority". This means that one word needs to be only once in the database. It means that the meanings and etymologies etc can be in any of the orthographies.. It means that you cannot record the relations between the words of these different orthographies/dialects. When words are properly identified, it means that we can use the information for a spell checker. It does not clearly help you understand what Meanings exist in a particular varietion of English. This is in my opinion the weakest option as it does not allow you to identify which meaning is true for a particular version of English. 3) We can label Meanings as belonging to one of these particular orthographies. When words are properly identified, it means that we can use the information for a spell checker. In my opinion the number 1 option is technically the best solution. Going for this option is propably less problematic then breaking the en.wikipedia.org into pieces. Going for this option seems like a lot of duplication. It does however provide us with the possibility to be more precise in what makes English different from American, Australian etc. Please let me know what you think and particularly why. Thanks, GerardM

2 2

Re: [Wikipedia-l] English orthographies
by Gerard Meijssen 19 Sep '05

19 Sep '05

Pawe³ Dembowski wrote: >Can't we just have a list of checkboxes for each dialect of a given >language for each meaning of a word? For words like "to be" or "dog" >we could just check boxes for every dialect, while we wouldn't for >words like petrol or gasoline... > How many different orthographies / dialects are there for English. I would not dare to presume that a word is truly shared between the less well known versions of English. There are also the "true" dialects like Geordie that have to be considered. Creating check boxes assumes in a way that the editors /know/ these different versions of English well enough. Technically it can be done, but the spelling of the text in a meaning, an etymology needs to be adapted anyway. You have to realise that certain meanings do not travel well. It is therefore not only the orthography but also the Meanings of a word that needs to be considered. Consequently, a simple check box is problematic in itself. Thanks, GerardM

1 0

[Fwd: [Wikitech-l] A small Ultimate Wiktionary demo]
by Gerard Meijssen 15 Sep '05

15 Sep '05

Hoi Erik did send this e-mail to the developers. I think however that you will be as interested in it as the developers. The relevance is that the Commons "tags" as Erik calls them use meanings defined in Ultimate Wiktionary. It is a preview that is really rough, but it shows the idea of using meaning and translations really well. Thanks, GerardM -------- Original Message -------- Subject: [Wikitech-l] A small Ultimate Wiktionary demo Date: Sun, 04 Sep 2005 21:57:47 +0200 From: Erik Moeller <erik_moeller(a)gmx.de> Reply-To: Wikimedia developers <wikitech-l(a)wikimedia.org> To: Wikimedia developers <wikitech-l(a)wikimedia.org> A while ago, Gerard posted this on Meta: http://meta.wikimedia.org/wiki/Using_Ultimate_Wiktionary_for_Commons It was a short explanation how UW could be used to internationalize categories on the Wikimedia Commons. I've now hacked together a small mock-up that demonstrates (hopefully) more clearly how this could work in practice: http://epov.org/uwd/index.php?title=Tag:Dog&action=edit (Further demos will be posted on http://epov.org/uwd/ in the coming weeks and months.) It should work in Firefox and IE. The only active component are the radio buttons you can click. Essentially, what this shows is: 1) A new tag for images of dogs is created. (In this demo, I call categories "tags", because I hope this will be what they are eventually called.) 2) The user can choose from the languages they speak to clarify which language this tag name is written in. 3) Based on the tag name and language, a lookup on UW is performed, which fetches all the associated meanings for this word. 4) The user selects one of these meanings. 5) Automagically, another lookup is performed to determine the available translations, if any. After saving the tag, it is then instantly available under these names in the other languages. In the demo, the first two meanings have translations available, while the other two do not. Why is this so powerful? Because, if UW itself is successful and contains many words, it almost instantly makes the entire media repository on Commons available to speakers of all languages. (Now, hopefully, you can see why we've been excited about getting millions of translations for free from the Logos project.) No need to create many different tags - just select the right meaning. Furthermore, it builds bridges from other projects to UW. The language work we are constantly doing will no longer be redundant, but focused on one place. A 14-year-old Italian kid can then use the tag "cane" to look for photos of dogs, while a Maori girl from New Zealand can use "kurii". Moreover, the same category hierarchy can be used to browse in different languages (based on user perferences, a fallback hierarchy would be queried to determine the language that should be used should no translation be available). We could also automatically make use of synonyms, plurals and inflections (though this requires further changes to the category code beyond internationalization). Given that we are mapping one of multiple meanings to a single tag, there will be tag collisions -- those will have to be dealt with through disambiguation. But this is not important: Try to see the tag name merely as a key to a meaning. What this key is called is secondary. The key principle of selecting a meaning and then performing automatic translations can be used in many different contexts. For example, in Wikidata, one could use the same principle to internationalize field names such as "Country", "Flag" and "Population". This application also shows that UW must contain everything from words to names to phrases. There is no limit to the scope of it. This makes it a potentially massively useful tool for both human and machine translation. The category internationalization functionality will not be part of the first release of Ultimate Wiktionary, but we believe we can get funding to work on this later. I believe that UW, in combination with better tagging features in general, could make our tagging system the most advanced one available. Flickr, for example, has no localization, is unlikely to ever get semi-automatic localization, and apparently supports no synonyms either. See the demo footnotes for further explanations. Feedback is welcome. (I'll be away until Wednesday.) Best, Erik _______________________________________________ Wikitech-l mailing list Wikitech-l(a)wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l

4 4

RE: The data design for the GEMET phase (part of the Ultimate Wiktionary)
by Stefan Jensen 02 Sep '05

02 Sep '05

Thanks a lot for your mail Gerard, As I am not so on top of your technological discussions, I like to ask when it is possible to experience this implementation. My main concern is to expose it widely as part of your open content initiative and learn how it goes with usage and in fact suggestion to alter terms/concepts or suggest new ones - in which of the many languages ever. Any action required from our side? Greetings and thanks a lot for all you enthusiastic work you are performing here Stefan -----Original Message----- From: Gerard Meijssen [mailto:gerard.meijssen@gmail.com] Sent: 01 September 2005 23:50 To: wiktionary-l(a)wikipedia.org; Wikimedia developers Cc: Anthere; Stefan Jensen Subject: The data design for the GEMET phase (part of the Ultimate Wiktionary) Hoi, We have always planned the implementation of the GEMET thesaurus as one of the steps in the implementation of the Ultimate Wiktionary. In some discussions we had, we considered that it might be a better idea to implement GEMET in a subset of the Ultimate Wiktionary in stead of in its own datadesign. Once it has been implemented, we will be able to learn many of the lessons we need to learn as reality shatters many false dreams. With this implementation, we implement several of the key functionalities of the UW. It will demonstrate the "eat your own dogfood" idea; the words for a Languages and for a WordRelation need to exist in order to implement their functionality and their localised value. It also demonstrates how a thesaures can be implemented in UW, it will show off how we can have relations between different meanings. The one thing I however like best is, that it demonstrates the core functionality of Ultimate Wiktionary much better. The data design for this is much easier to understand. The absolute core functionality however is without the Collection, the CollectionLanguage and the CollectionMeaning tables. These are needed for the implementation of GEMET. As it is a subset, it does not have many of the features that will exist in the full blown version of the UW. These can however be added one at a time. This allows for more frequent updates and this will propably lead to much more excitement as we will have more often new features to show. More info can be found here: http://meta.wikimedia.org/wiki/Ultimate_Wiktionary_data_design Thanks, GerardM

1 0

The data design for the GEMET phase (part of the Ultimate Wiktionary)
by Gerard Meijssen 01 Sep '05

01 Sep '05

Hoi, We have always planned the implementation of the GEMET thesaurus as one of the steps in the implementation of the Ultimate Wiktionary. In some discussions we had, we considered that it might be a better idea to implement GEMET in a subset of the Ultimate Wiktionary in stead of in its own datadesign. Once it has been implemented, we will be able to learn many of the lessons we need to learn as reality shatters many false dreams. With this implementation, we implement several of the key functionalities of the UW. It will demonstrate the "eat your own dogfood" idea; the words for a Languages and for a WordRelation need to exist in order to implement their functionality and their localised value. It also demonstrates how a thesaures can be implemented in UW, it will show off how we can have relations between different meanings. The one thing I however like best is, that it demonstrates the core functionality of Ultimate Wiktionary much better. The data design for this is much easier to understand. The absolute core functionality however is without the Collection, the CollectionLanguage and the CollectionMeaning tables. These are needed for the implementation of GEMET. As it is a subset, it does not have many of the features that will exist in the full blown version of the UW. These can however be added one at a time. This allows for more frequent updates and this will propably lead to much more excitement as we will have more often new features to show. More info can be found here: http://meta.wikimedia.org/wiki/Ultimate_Wiktionary_data_design Thanks, GerardM

1 0

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Wiktionary-l September 2005