Greetings,
[Gerard Meijssen (Re: [Wiktionary-l] English orthographies) writes:]
> Jim Breen wrote:
> >[Gerard Meijssen ([Wiktionary-l] English orthographies) writes:]
> >>>1) English, American English and other orthographies are treated as
> >>>seperate entities.
> >
> >I think this will be a disaster.
> >
> >Can you explain why "jewellery" and "jewelry" cannot be
alternatives
> >within the one entry?
> In the database design,
>
http://meta.wikimedia.org/wiki/Ultimate_Wiktionary_data_design , an
> Expression is a number of characters that make up a valid occurrence in
> a language. Therefore every spelling IS a different Expression.
OK, so the short answer is that the UW database was designed that way.
I predict it will be a mess. It is also at variance with all the
lexicographical databases I have seen.
> The English used in Britain, the United States,
Australia etc is
> significantly different.
Nonsense.
> This can be found in the difference in
> vocabulary and the difference in orthography.
Both the spelling and vocabulary differ only to a very small extent.
> Typically when considering
> spelling, the way the English, American, Australian spell differently
> makes it a different orthography. This is reflected in there being
> English, American etc dead wood dictionaries. In a project like UW where
> we collect all words of all languages, it makes sense to reflect this.
By all means collect them and reflect them, but don't foster the
impression that they are a major issue, because they aren't. Also don't
fall into the trap of thinking that you can neatly compartmentalize
English spellings into strict country groups. Different mixes of spellings
are used right across the English-speaking world.
BTW, spelling and orthography are different things. Orthography refers
to the writing system, i.e. "a method of representing the sounds of a
language by written or printed symbols" (to quote Wordnet.) English
is written with one orthographical system.
To a large extent I don't really care that much about "jewellery" and
"jewelry" being in their own entries in English, because they are only a
few percent of words. Where this approach will be a total disaster is
with Japanese, where most words can be and are written in two or
more scripts, and where spelling variations are rife. The idea that the
meaning, POS, etc. etc. for a word will be replicated again and again
and again for each writing variant is too awful to contemplate.
To be blunt, it sounds like the UW database design was done with one
or a few languages in mind, and the others are being told to fall into
line.
Cheers
Jim
--
Jim Breen
http://www.csse.monash.edu.au/~jwb/
Clayton School of Information Technology, Tel: +61 3 9905 9554
Monash University, VIC 3800, Australia Fax: +61 3 9905 5146
(Monash Provider No. 00008C) ジム・ブリーン@モナシュ大学