Greetings,
[Gerard Meijssen (Re: [Wiktionary-l] English orthographies) writes:]
Jim Breen wrote:
[Gerard Meijssen ([Wiktionary-l] English orthographies) writes:]
- English, American English and other orthographies are treated as
seperate entities.
I think this will be a disaster.
Can you explain why "jewellery" and "jewelry" cannot be alternatives within the one entry?
In the database design, http://meta.wikimedia.org/wiki/Ultimate_Wiktionary_data_design , an Expression is a number of characters that make up a valid occurrence in a language. Therefore every spelling IS a different Expression.
OK, so the short answer is that the UW database was designed that way.
I predict it will be a mess. It is also at variance with all the lexicographical databases I have seen.
The English used in Britain, the United States, Australia etc is significantly different.
Nonsense.
This can be found in the difference in vocabulary and the difference in orthography.
Both the spelling and vocabulary differ only to a very small extent.
Typically when considering spelling, the way the English, American, Australian spell differently makes it a different orthography. This is reflected in there being English, American etc dead wood dictionaries. In a project like UW where we collect all words of all languages, it makes sense to reflect this.
By all means collect them and reflect them, but don't foster the impression that they are a major issue, because they aren't. Also don't fall into the trap of thinking that you can neatly compartmentalize English spellings into strict country groups. Different mixes of spellings are used right across the English-speaking world.
BTW, spelling and orthography are different things. Orthography refers to the writing system, i.e. "a method of representing the sounds of a language by written or printed symbols" (to quote Wordnet.) English is written with one orthographical system.
To a large extent I don't really care that much about "jewellery" and "jewelry" being in their own entries in English, because they are only a few percent of words. Where this approach will be a total disaster is with Japanese, where most words can be and are written in two or more scripts, and where spelling variations are rife. The idea that the meaning, POS, etc. etc. for a word will be replicated again and again and again for each writing variant is too awful to contemplate.
To be blunt, it sounds like the UW database design was done with one or a few languages in mind, and the others are being told to fall into line.
Cheers
Jim