Jack & Naree wrote:
>>From my point of view as a native English speaker who lives in England (not
>in the American sense of "UK"!)
>I think it's important that when I type in words like:
> "colour"
> That I get "colour" and not "color". I don't see why I need to know the
>spelling of the word in what to me is a foreign language when I'm looking up
>a word in a dictionary of my language.
> ---
> To GerardM
> Basically there are two orthographies for English.
> Some might argue the toss about that, I mean for myself as a "Scots"
>speaker, I know there are some who make a big deal about it being a separate
>language, but I myself don't know how to spell it properly, and I just think
>of it as a regional dialect of English like Scouse, Yorkshire, Texan, Kiwi
>and whatever else - because I would read normal English and pronounce it the
>same way as Scots - the odd word like "leid" to me is no different than a
>Yorkshireman saying "summat" for "something" or someone from the southeast
>USA saying "y'all" - they're just dialect words.
>I can show you texts written in a Yorkshire orthography, but the practical
>fact is that the overwhelming majority of the text in the modern world is
>either spelt the English way, or the American way.
> The debate is "huge" in terms of it's implications, because up until now
>no-one appears to have challenged the idea that American-English has the
>right to be considered the standard form of English. It's patently obvious
>it's a dialect, with it's own orthography and it's simply wrong for the
>headword in English to be written in a dialect of English in a dialectal
>orthography and presented as the standard form, when it's not.
>
>
Hoi,
I would not consider either variation of English to be more or less
important/relevant. What I consider is practical; how does it impact
including this content in Ultimate Wiktionary.. Here we have a need to
identify a word as either EE or AE or ?E and the question is how to do this.
It is up to the Wiktionary comunity how they want to have this. They can
either have it with descriptions in definitions and etymologies spelled
in one of the used orthographies or it can be considered not to be too
important and it can be either.
Thanks,
GerardM
[Gerard Meijssen ([Wiktionary-l] Re: [Wikipedia-l] English orthographies) writes:]
>> How many different orthographies / dialects are there for English.
It all depends on definition. I say there is one "orthography" for
English (the A-Z "Latin" alphabet). Perhaps you are confusing
"orthography" and spelling.
As for dialects, well many people who work in dialects do not regard
"British English" and "American English" as distinct dialects because
they are very similar (when compared with real dialects.)
>> I
>> would not dare to presume that a word is truly shared between the less
>> well known versions of English. There are also the "true" dialects like
>> Geordie that have to be considered. Creating check boxes assumes in a
>> way that the editors /know/ these different versions of English well
>> enough. Technically it can be done, but the spelling of the text in a
>> meaning, an etymology needs to be adapted anyway. You have to realise
>> that certain meanings do not travel well. It is therefore not only the
>> orthography but also the Meanings of a word that needs to be considered.
Yes, I suspect you have conflated "orthography" and "spelling".
Jim
--
Jim Breen http://www.csse.monash.edu.au/~jwb/
Clayton School of Information Technology, Tel: +61 3 9905 9554
Monash University, VIC 3800, Australia Fax: +61 3 9905 5146
(Monash Provider No. 00008C) ジム・ブリーン@モナシュ大学
[Gerard Meijssen ([Wiktionary-l] English orthographies) writes:]
>>
>> 1) English, American English and other orthographies are treated as
>> seperate entities.
I think this will be a disaster.
Can you explain why "jewellery" and "jewelry" cannot be alternatives
within the one entry?
>> This means that all words need to exist for each
>> orthography/dialect. On the plus side it means that descriptions like
>> etymology and meaning will be in this one orthography as well. This is
>> also the most easy method to provide information for a spell checker.
In what way is this easier "to provide information for a spell checker"
than having spelling variants with an entry?
>> 2) We treat these variants as belonging to a specific "spelling
>> authority".
I wonder what a "spelling authority" would be for English.
>> ..... It does however provide us with the possibility to be more
>> precise in what makes English different from American, Australian etc.
It's news to me that "English [is] different from American, Australian,.."
The versions of English used in the UK, USA, Australia, Canada, South
Africa, etc. do vary, although not to a significant extent. The
spellings also differ for a very small percentage of words, and meanings
differ slightly too, although this happens *within* the UK, USA,
Australia, etc. too.
Only a small part of the regional variation in English can be reflected
in a dictionary. Much of it is grammar, and choice of words.
Jim
--
Jim Breen http://www.csse.monash.edu.au/~jwb/
Clayton School of Information Technology, Tel: +61 3 9905 9554
Monash University, VIC 3800, Australia Fax: +61 3 9905 5146
(Monash Provider No. 00008C) ジム・ブリーン@モナシュ大学
Hoi,
To explain to the people on the Wiktionary mailinglist where this comes
from, there is a huge debate on the Wikipedia-l mailinglist about having
a seperate English and American English wikipedia.
In the plans for Ultimate Wiktionary there are three ways in which words
can be destinguished as being of a particular orthography. I will
describe these here and hope to use the energy of this discussion for
this question that needs a resolution at some stage.
1) English, American English and other orthographies are treated as
seperate entities. This means that all words need to exist for each
orthography/dialect. On the plus side it means that descriptions like
etymology and meaning will be in this one orthography as well. This is
also the most easy method to provide information for a spell checker.
2) We treat these variants as belonging to a specific "spelling
authority". This means that one word needs to be only once in the
database. It means that the meanings and etymologies etc can be in any
of the orthographies.. It means that you cannot record the relations
between the words of these different orthographies/dialects. When words
are properly identified, it means that we can use the information for a
spell checker. It does not clearly help you understand what Meanings
exist in a particular varietion of English. This is in my opinion the
weakest option as it does not allow you to identify which meaning is
true for a particular version of English.
3) We can label Meanings as belonging to one of these particular
orthographies. When words are properly identified, it means that we can
use the information for a spell checker.
In my opinion the number 1 option is technically the best solution.
Going for this option is propably less problematic then breaking the
en.wikipedia.org into pieces. Going for this option seems like a lot of
duplication. It does however provide us with the possibility to be more
precise in what makes English different from American, Australian etc.
Please let me know what you think and particularly why.
Thanks,
GerardM
Pawe³ Dembowski wrote:
>Can't we just have a list of checkboxes for each dialect of a given
>language for each meaning of a word? For words like "to be" or "dog"
>we could just check boxes for every dialect, while we wouldn't for
>words like petrol or gasoline...
>
How many different orthographies / dialects are there for English. I
would not dare to presume that a word is truly shared between the less
well known versions of English. There are also the "true" dialects like
Geordie that have to be considered. Creating check boxes assumes in a
way that the editors /know/ these different versions of English well
enough. Technically it can be done, but the spelling of the text in a
meaning, an etymology needs to be adapted anyway. You have to realise
that certain meanings do not travel well. It is therefore not only the
orthography but also the Meanings of a word that needs to be considered.
Consequently, a simple check box is problematic in itself.
Thanks,
GerardM
Hoi
Erik did send this e-mail to the developers. I think however that you
will be as interested in it as the developers. The relevance is that the
Commons "tags" as Erik calls them use meanings defined in Ultimate
Wiktionary. It is a preview that is really rough, but it shows the idea
of using meaning and translations really well.
Thanks,
GerardM
-------- Original Message --------
Subject: [Wikitech-l] A small Ultimate Wiktionary demo
Date: Sun, 04 Sep 2005 21:57:47 +0200
From: Erik Moeller <erik_moeller(a)gmx.de>
Reply-To: Wikimedia developers <wikitech-l(a)wikimedia.org>
To: Wikimedia developers <wikitech-l(a)wikimedia.org>
A while ago, Gerard posted this on Meta:
http://meta.wikimedia.org/wiki/Using_Ultimate_Wiktionary_for_Commons
It was a short explanation how UW could be used to internationalize
categories on the Wikimedia Commons. I've now hacked together a small
mock-up that demonstrates (hopefully) more clearly how this could work
in practice:
http://epov.org/uwd/index.php?title=Tag:Dog&action=edit
(Further demos will be posted on http://epov.org/uwd/ in the coming
weeks and months.)
It should work in Firefox and IE. The only active component are the
radio buttons you can click.
Essentially, what this shows is:
1) A new tag for images of dogs is created. (In this demo, I call
categories "tags", because I hope this will be what they are eventually
called.)
2) The user can choose from the languages they speak to clarify which
language this tag name is written in.
3) Based on the tag name and language, a lookup on UW is performed,
which fetches all the associated meanings for this word.
4) The user selects one of these meanings.
5) Automagically, another lookup is performed to determine the available
translations, if any. After saving the tag, it is then instantly
available under these names in the other languages.
In the demo, the first two meanings have translations available, while
the other two do not.
Why is this so powerful? Because, if UW itself is successful and
contains many words, it almost instantly makes the entire media
repository on Commons available to speakers of all languages. (Now,
hopefully, you can see why we've been excited about getting millions of
translations for free from the Logos project.) No need to create many
different tags - just select the right meaning. Furthermore, it builds
bridges from other projects to UW. The language work we are constantly
doing will no longer be redundant, but focused on one place.
A 14-year-old Italian kid can then use the tag "cane" to look for photos
of dogs, while a Maori girl from New Zealand can use "kurii". Moreover,
the same category hierarchy can be used to browse in different languages
(based on user perferences, a fallback hierarchy would be queried to
determine the language that should be used should no translation be
available).
We could also automatically make use of synonyms, plurals and
inflections (though this requires further changes to the category code
beyond internationalization). Given that we are mapping one of multiple
meanings to a single tag, there will be tag collisions -- those will
have to be dealt with through disambiguation. But this is not important:
Try to see the tag name merely as a key to a meaning. What this key is
called is secondary.
The key principle of selecting a meaning and then performing automatic
translations can be used in many different contexts. For example, in
Wikidata, one could use the same principle to internationalize field
names such as "Country", "Flag" and "Population".
This application also shows that UW must contain everything from words
to names to phrases. There is no limit to the scope of it. This makes it
a potentially massively useful tool for both human and machine translation.
The category internationalization functionality will not be part of the
first release of Ultimate Wiktionary, but we believe we can get funding
to work on this later. I believe that UW, in combination with better
tagging features in general, could make our tagging system the most
advanced one available. Flickr, for example, has no localization, is
unlikely to ever get semi-automatic localization, and apparently
supports no synonyms either.
See the demo footnotes for further explanations. Feedback is welcome.
(I'll be away until Wednesday.)
Best,
Erik
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)wikimedia.org
http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Thanks a lot for your mail Gerard,
As I am not so on top of your technological discussions, I like to ask when it is possible to experience this implementation.
My main concern is to expose it widely as part of your open content initiative and learn how it goes with usage and in fact suggestion to alter terms/concepts or suggest new ones - in which of the many languages ever.
Any action required from our side?
Greetings and thanks a lot for all you enthusiastic work you are performing here
Stefan
-----Original Message-----
From: Gerard Meijssen [mailto:gerard.meijssen@gmail.com]
Sent: 01 September 2005 23:50
To: wiktionary-l(a)wikipedia.org; Wikimedia developers
Cc: Anthere; Stefan Jensen
Subject: The data design for the GEMET phase (part of the Ultimate Wiktionary)
Hoi,
We have always planned the implementation of the GEMET thesaurus as one
of the steps in the implementation of the Ultimate Wiktionary. In some
discussions we had, we considered that it might be a better idea to
implement GEMET in a subset of the Ultimate Wiktionary in stead of in
its own datadesign. Once it has been implemented, we will be able to
learn many of the lessons we need to learn as reality shatters many
false dreams. With this implementation, we implement several of the key
functionalities of the UW. It will demonstrate the "eat your own
dogfood" idea; the words for a Languages and for a WordRelation need to
exist in order to implement their functionality and their localised
value. It also demonstrates how a thesaures can be implemented in UW, it
will show off how we can have relations between different meanings.
The one thing I however like best is, that it demonstrates the core
functionality of Ultimate Wiktionary much better. The data design for
this is much easier to understand. The absolute core functionality
however is without the Collection, the CollectionLanguage and the
CollectionMeaning tables. These are needed for the implementation of GEMET.
As it is a subset, it does not have many of the features that will exist
in the full blown version of the UW. These can however be added one at a
time. This allows for more frequent updates and this will propably lead
to much more excitement as we will have more often new features to show.
More info can be found here:
http://meta.wikimedia.org/wiki/Ultimate_Wiktionary_data_design
Thanks,
GerardM
Hoi,
We have always planned the implementation of the GEMET thesaurus as one
of the steps in the implementation of the Ultimate Wiktionary. In some
discussions we had, we considered that it might be a better idea to
implement GEMET in a subset of the Ultimate Wiktionary in stead of in
its own datadesign. Once it has been implemented, we will be able to
learn many of the lessons we need to learn as reality shatters many
false dreams. With this implementation, we implement several of the key
functionalities of the UW. It will demonstrate the "eat your own
dogfood" idea; the words for a Languages and for a WordRelation need to
exist in order to implement their functionality and their localised
value. It also demonstrates how a thesaures can be implemented in UW, it
will show off how we can have relations between different meanings.
The one thing I however like best is, that it demonstrates the core
functionality of Ultimate Wiktionary much better. The data design for
this is much easier to understand. The absolute core functionality
however is without the Collection, the CollectionLanguage and the
CollectionMeaning tables. These are needed for the implementation of GEMET.
As it is a subset, it does not have many of the features that will exist
in the full blown version of the UW. These can however be added one at a
time. This allows for more frequent updates and this will propably lead
to much more excitement as we will have more often new features to show.
More info can be found here:
http://meta.wikimedia.org/wiki/Ultimate_Wiktionary_data_design
Thanks,
GerardM