[Wiktionary-l] GNU-FDL and Ultimate Wiktionary

cookfire cookfire at softhome.net
Mon May 30 14:34:11 UTC 2005


Jimmy Wales wrote:

>Ray Saintonge wrote:
>  
>
>>Not really, since the existing wiktionaries would continue as they have
>>all along.
>>    
>>
>
>I am not agreeing or disagreeing or taking any position on this at all.
> Please consider my questions about this to be merely data gathering. :-)
>
>Why should we have such a formal break?  Why shouldn't we instead
>transition from one to the other in a seamless way?  I worry a lot about
>duplication of effort, etc.
>
>Are there objections from old-school wiktionarians to the ultimate
>wiktionary plans?  Is it possible that we could address these objections
>through code?
>  
>
Hi Jimbo et al,

What GerardM proposes as the ultimate Wiktionary is what I have been 
considering to do all along: a dictionary in a relational database. It's 
a pity I can't seem to find more time to contribute to its genesis. What 
it would mean is basically that we would all be working on the same set 
of data, instead of having each language on its balkanized Wiktionary . 
People who speak French get a French interface to the data. People who 
speak Swahili get an interface in swahili. Part of the interface could 
be to show that content exists in other languages for the description of 
a word, for an etymology, etc.
It is a pity that there is an enormous duplication of effort and 
possibility of errors. An error may get corrected in one Wiktionary, but 
it isn't in the other Wiktionaries. The UW wants to solve these 
problems. Also if somebody in the Dutch Wiktionary knows the translation 
of a word to Frisian, this knowledge doesn't propagate to the other 
language Wiktionaries. Not in an efficient way anyway. With the UW if 
somebody adds a translation, this translation becomes available to 
everybody.

Of course, new problems will crop up. For instance, translations are not 
transitive for all but the simplest of terms and concepts. We need to 
find a solution to that problem. I was considering to take a step back 
and group meanings into one table and concepts in another to try to 
solve the problem. The issue with that is that it adds a layer of 
abstraction which might obscure things a bit. (It would become possible 
to create a 'meaning', where one already existed, thus resulting in two 
entries for the same 'meaning'). Such things have to be discovered and 
resolved. I'm sure we will find a way to accomplish that the Wiki way.

One would need quite a few layers of information:

words or lemmas with a certain spelling

    language (this is the only one that has to be defined)
    part of speech
    many more properties that are optional, such as whether it is
    normally capitalized, whether it's  a conjugated/flexed form,
    singular/plural, etc.

These words can be grouped to form expressions (and an expression can 
also be only one word)
    bathroom could be one entry, salle de bains another
    This is the 'dictionary layer, where the words/expressions can be 
described, etymologies can be added etc. This can be done in several 
languages. The user gets to see the language(s) he wants to see.

The next layer points to these expressions and combines all the ones 
that have the same meaning.

The way I conceived it, I added yet another layer 'concepts'. Here I 
grouped the meaning 'mum' and 'mother'. They are the same concept, but 
the meaning is different, meaning the translations have to be different 
(maman, mama, mamá), (mère, moeder, madre).


I used some more lookup tables and tables to create many to many 
relations which I left out of the picture here. The concepts can be 
illustrated with graphical material, sounds and videos and the 
pronunciations can be added with something like IPA and point to actual 
sound files so one can actually listen to them.

I posted the design of all this before, but apparently the fact that I 
chose to name my tables with a neutral language like Esperanto managed 
to make it inaccessible to developers.

Polyglot



More information about the Wiktionary-l mailing list