[Andrew Dunbar ([Wiktionary-l] Re: English orthographies) writes:]
On 9/22/05, Jim Breen Jim.Breen@infotech.monash.edu.au wrote:
Yes, in fact it is the frowning on redirects that led me to looking at the UW proposals. I was looking at the Wiktionary structure to see if it would be a suitable environment for my Japanese-Multilingual dictionary database. I ran into a number of problems, one of which was the "no redirects" policy, and someone suggested I look at UW.
The "frowning on redirects" policy is largely due to the fact that we have many languages in one "namespace". When a particular English spelling variant or even a plural happens to coincide with the spelling of another word in another language then we have to have two pages anyway. This is not uncommon. We then decided it was better to try for some consistency rather than having some shared pages and some redirects. The other major issue was what to do when a dictionary is created for both the British (colour, centre) market and the American (color, center) market without us trying to force upon anyone which is the "standard" and which is the "variant", which redirects lead to.
This second problem goes away if a search for an entry can be made on more than one "headword". In fact single headwords is a limitation of paper dictionaries that never needed to be propagated into electronic dictionaries.
Provided: (a) the essential information (senses, POS, etymology, etc.) only has to be entered once, and remains the same for all the spelling and orthographical variants;
Sometimes some of these will be different. In British and the Commonwealth except Canada "tire" only means "become tired". In US and Canadian English it also means "tyre", the rubber ring on the outside of a wheel. But these are homonyms rather than senses though many non-lexography savvy people don't realise the difference.
Of course they are homonyms. With a relatively small set of phonemes, Japanese is riddled with homonyms; there are cases of more than 20 different words with the same pronunciation. You'd go (and be) crazy if you tried to treat them as the one "word".
(b) the user, on entering either form, gets the one collection of information which shows all the alternative forms of the word, then I really have no objection. I can't understand why they are in different database records, and in the case of my own JMdict (XML) they aren't, but then I don't use SynTrans, etc.
Basically it's an arbitrary database design issue. UW is going for more granularity. In this way it's probably more object-oriented since it breaks things down into more, smaller objects. There is nothing intrinsic right or wrong about either approach.
Provided the design doesn't intrude into the operation (creation, maintenance, lookup, etc.)
Not really. I don't know about the languages I don't speak (i.e. everything apart from English, Japanese, French and a little Latin), but in general the spelling has little or nothing to do with the etymology.
Sometimes one spelling is definitely known to be derived from another and both remain in use in various places. For instance the Spanish word for "peanut" was borrowed from Nahuatl in Mexico as "cacahuate" but when it was later borrwed into Spain itself it became "cacahuete". It would be a shame to not have a way to record such things in the cases we do know them.
I was really referring to the centre/center, colour/color situations. I should have said "minor spelling differences".
Cheers
Jim
wiktionary-l@lists.wikimedia.org