[Wikipedia-l] Re: [Wiktionary-l] English orthographies

Gerard Meijssen gerard.meijssen at gmail.com
Tue Sep 20 16:51:59 UTC 2005


Jim Breen wrote:

>Greetings,
>
>[Gerard Meijssen (Re: [Wiktionary-l] English orthographies) writes:]
>  
>
>>>Jim Breen wrote:
>>>      
>>>
>>>>[Gerard Meijssen ([Wiktionary-l] English orthographies) writes:]
>>>>        
>>>>
>>>>>>1) English, American English and other orthographies are treated as 
>>>>>>seperate entities. 
>>>>>>            
>>>>>>
>>>>I think this will be a disaster.
>>>>
>>>>Can you explain why "jewellery" and "jewelry" cannot be alternatives
>>>>within the one entry?
>>>>        
>>>>
>
>  
>
>>>In the database design,
>>>http://meta.wikimedia.org/wiki/Ultimate_Wiktionary_data_design , an
>>>Expression is a number of characters that make up a valid occurrence in
>>>a language. Therefore every spelling IS a different Expression. 
>>>      
>>>
>
>OK, so the short answer is that the UW database was designed that way.
>
>I predict it will be a mess. It is also at variance with all the
>lexicographical databases I have seen.
>  
>
Yes, but there is as far as I know no database that wants to have ALL
languages and ALL words in one database. This is what Wiktionary does
and that is what the Ultimate Wiktionary makes markedly different from
what went before.

>  
>
>>>The English used in Britain, the United States, Australia etc is
>>>significantly different. 
>>>      
>>>
>
>Nonsense.
>  
>
That is a matter of opinion. I agree that to some extend the English is
the same. My English I learned at school and I lived in Great Britain
for a few years. I can tell you that I fall foul on misunderstanding
Americans sufficiently. The meaning is different and I have suffered the
consequences of not understanding well what was meant.

>  
>
>>>This can be found in the difference in
>>>vocabulary and the difference in orthography. 
>>>      
>>>
>
>Both the spelling and vocabulary differ only to a very small extent.
>  
>
When you however want the meanings described in that particular
orthography you need to have it specified as something specific.

>  
>
>>>Typically when considering
>>>spelling, the way the English, American, Australian spell differently
>>>makes it a different orthography. This is reflected in there being
>>>English, American etc dead wood dictionaries. In a project like UW where
>>>we collect all words of all languages, it makes sense to reflect this.
>>>      
>>>
>
>By all means collect them and reflect them, but don't foster the
>impression that they are a major issue, because they aren't. Also don't
>fall into the trap of thinking that you can neatly compartmentalize 
>English spellings into strict country groups. Different mixes of spellings
>are used right across the English-speaking world.
>
>BTW, spelling and orthography are different things. Orthography refers
>to the writing system, i.e. "a method of representing the sounds of a
>language by written or printed symbols" (to quote Wordnet.) English
>is written with one orthographical system.
>  
>
To be more "politically correct" Wiktionary gives as a definition: "The
study of correct spelling according to established usage". The
established usage in Britain is different from the United States and
therefore it could be considered a different orthography.

>To a large extent I don't really care that much about "jewellery" and
>"jewelry" being in their own entries in English, because they are only a
>few percent of words. Where this approach will be a total disaster is
>with Japanese, where most words can be and are written in two or
>more scripts, and where spelling variations are rife. The idea that the
>meaning, POS, etc. etc. for a word will be replicated again and again
>and again for each writing variant is too awful to contemplate.
>  
>
With Japanese all these different ways of writing are ALL accepted
Japanese. They are ALL used intermixed and there is no reason why they
should not be. There is nothing that disallows Japanese expressions in
multiple charactersets. When a Word is included, it only needs to be
connected to the same Meaning through SynTrans to share the same meaning.

I am happy to explain how this works out from within the database
design. So consider asking questions instead of making pronouncements
about how "bad" it will be.

What is a POS?? part of speech ??

>To be blunt, it sounds like the UW database design was done with one
>or a few languages in mind, and the others are being told to fall into
>line.
>
I like you to be plainspoken, but consider; we consider sign languages,
we allow for these great bits of software that show strokes for Chinese
/ Japanese characters. We allow for many relations and labels for Words
/ Expressions / Meanings. Tell me what your issue is and I will see how
it fits in the database design and explain this to you.

I will not be too suprised when we find things that need improvement.
But improvements can only be made when the issues are identified that
are not catered for in the current database design. So give me your
issues and let us work toward solutions.

Thanks,
GerardM




More information about the Wikipedia-l mailing list