Greetings,
[Gerard Meijssen (Re: [Wiktionary-l] English orthographies) writes:]
Jim Breen wrote:
[Gerard Meijssen ([Wiktionary-l] English orthographies) writes:]
- English, American English and other orthographies are treated as
seperate entities.
I think this will be a disaster.
Can you explain why "jewellery" and "jewelry" cannot be alternatives within the one entry?
In the database design, http://meta.wikimedia.org/wiki/Ultimate_Wiktionary_data_design , an Expression is a number of characters that make up a valid occurrence in a language. Therefore every spelling IS a different Expression.
OK, so the short answer is that the UW database was designed that way.
I predict it will be a mess. It is also at variance with all the lexicographical databases I have seen.
The English used in Britain, the United States, Australia etc is significantly different.
Nonsense.
This can be found in the difference in vocabulary and the difference in orthography.
Both the spelling and vocabulary differ only to a very small extent.
Typically when considering spelling, the way the English, American, Australian spell differently makes it a different orthography. This is reflected in there being English, American etc dead wood dictionaries. In a project like UW where we collect all words of all languages, it makes sense to reflect this.
By all means collect them and reflect them, but don't foster the impression that they are a major issue, because they aren't. Also don't fall into the trap of thinking that you can neatly compartmentalize English spellings into strict country groups. Different mixes of spellings are used right across the English-speaking world.
BTW, spelling and orthography are different things. Orthography refers to the writing system, i.e. "a method of representing the sounds of a language by written or printed symbols" (to quote Wordnet.) English is written with one orthographical system.
To a large extent I don't really care that much about "jewellery" and "jewelry" being in their own entries in English, because they are only a few percent of words. Where this approach will be a total disaster is with Japanese, where most words can be and are written in two or more scripts, and where spelling variations are rife. The idea that the meaning, POS, etc. etc. for a word will be replicated again and again and again for each writing variant is too awful to contemplate.
To be blunt, it sounds like the UW database design was done with one or a few languages in mind, and the others are being told to fall into line.
Cheers
Jim
Jim Breen wrote:
Greetings,
[Gerard Meijssen (Re: [Wiktionary-l] English orthographies) writes:]
Jim Breen wrote:
[Gerard Meijssen ([Wiktionary-l] English orthographies) writes:]
- English, American English and other orthographies are treated as
seperate entities.
I think this will be a disaster.
Can you explain why "jewellery" and "jewelry" cannot be alternatives within the one entry?
In the database design, http://meta.wikimedia.org/wiki/Ultimate_Wiktionary_data_design , an Expression is a number of characters that make up a valid occurrence in a language. Therefore every spelling IS a different Expression.
OK, so the short answer is that the UW database was designed that way.
I predict it will be a mess. It is also at variance with all the lexicographical databases I have seen.
Yes, but there is as far as I know no database that wants to have ALL languages and ALL words in one database. This is what Wiktionary does and that is what the Ultimate Wiktionary makes markedly different from what went before.
The English used in Britain, the United States, Australia etc is significantly different.
Nonsense.
That is a matter of opinion. I agree that to some extend the English is the same. My English I learned at school and I lived in Great Britain for a few years. I can tell you that I fall foul on misunderstanding Americans sufficiently. The meaning is different and I have suffered the consequences of not understanding well what was meant.
This can be found in the difference in vocabulary and the difference in orthography.
Both the spelling and vocabulary differ only to a very small extent.
When you however want the meanings described in that particular orthography you need to have it specified as something specific.
Typically when considering spelling, the way the English, American, Australian spell differently makes it a different orthography. This is reflected in there being English, American etc dead wood dictionaries. In a project like UW where we collect all words of all languages, it makes sense to reflect this.
By all means collect them and reflect them, but don't foster the impression that they are a major issue, because they aren't. Also don't fall into the trap of thinking that you can neatly compartmentalize English spellings into strict country groups. Different mixes of spellings are used right across the English-speaking world.
BTW, spelling and orthography are different things. Orthography refers to the writing system, i.e. "a method of representing the sounds of a language by written or printed symbols" (to quote Wordnet.) English is written with one orthographical system.
To be more "politically correct" Wiktionary gives as a definition: "The study of correct spelling according to established usage". The established usage in Britain is different from the United States and therefore it could be considered a different orthography.
To a large extent I don't really care that much about "jewellery" and "jewelry" being in their own entries in English, because they are only a few percent of words. Where this approach will be a total disaster is with Japanese, where most words can be and are written in two or more scripts, and where spelling variations are rife. The idea that the meaning, POS, etc. etc. for a word will be replicated again and again and again for each writing variant is too awful to contemplate.
With Japanese all these different ways of writing are ALL accepted Japanese. They are ALL used intermixed and there is no reason why they should not be. There is nothing that disallows Japanese expressions in multiple charactersets. When a Word is included, it only needs to be connected to the same Meaning through SynTrans to share the same meaning.
I am happy to explain how this works out from within the database design. So consider asking questions instead of making pronouncements about how "bad" it will be.
What is a POS?? part of speech ??
To be blunt, it sounds like the UW database design was done with one or a few languages in mind, and the others are being told to fall into line.
I like you to be plainspoken, but consider; we consider sign languages, we allow for these great bits of software that show strokes for Chinese / Japanese characters. We allow for many relations and labels for Words / Expressions / Meanings. Tell me what your issue is and I will see how it fits in the database design and explain this to you.
I will not be too suprised when we find things that need improvement. But improvements can only be made when the issues are identified that are not catered for in the current database design. So give me your issues and let us work toward solutions.
Thanks, GerardM
Jim Breen wrote:
Greetings,
[Gerard Meijssen (Re: [Wiktionary-l] English orthographies) writes:]
In the database design, http://meta.wikimedia.org/wiki/Ultimate_Wiktionary_data_design , an Expression is a number of characters that make up a valid occurrence in a language. Therefore every spelling IS a different Expression.
OK, so the short answer is that the UW database was designed that way.
I predict it will be a mess. It is also at variance with all the lexicographical databases I have seen.
I largely share your prediction. It's trying to be too many things for too many people. That has some merit in an ideal world. At this point the UW software has not been made public, and we have not yet had the opportunity to poke at it. Seeking to solve these problems before it is made public is likely to make the result brittle.
The aikido master lets the dance evolve.
The English used in Britain, the United States, Australia etc is significantly different.
Nonsense.
The dedicated nationalists would disagree.
To be blunt, it sounds like the UW database design was done with one or a few languages in mind, and the others are being told to fall into line.
This is reminiscent of the classics movement of a few centuries ago that considered Greek and Latin as the standard for evaluating all languages.
Ec
wiktionary-l@lists.wikimedia.org