Jimmy Wales wrote:
Ray Saintonge wrote:
Not really, since the existing wiktionaries would continue as they have all along.
I am not agreeing or disagreeing or taking any position on this at all. Please consider my questions about this to be merely data gathering. :-)
Why should we have such a formal break? Why shouldn't we instead transition from one to the other in a seamless way? I worry a lot about duplication of effort, etc.
Are there objections from old-school wiktionarians to the ultimate wiktionary plans? Is it possible that we could address these objections through code?
Hi Jimbo et al,
What GerardM proposes as the ultimate Wiktionary is what I have been considering to do all along: a dictionary in a relational database. It's a pity I can't seem to find more time to contribute to its genesis. What it would mean is basically that we would all be working on the same set of data, instead of having each language on its balkanized Wiktionary . People who speak French get a French interface to the data. People who speak Swahili get an interface in swahili. Part of the interface could be to show that content exists in other languages for the description of a word, for an etymology, etc. It is a pity that there is an enormous duplication of effort and possibility of errors. An error may get corrected in one Wiktionary, but it isn't in the other Wiktionaries. The UW wants to solve these problems. Also if somebody in the Dutch Wiktionary knows the translation of a word to Frisian, this knowledge doesn't propagate to the other language Wiktionaries. Not in an efficient way anyway. With the UW if somebody adds a translation, this translation becomes available to everybody.
Of course, new problems will crop up. For instance, translations are not transitive for all but the simplest of terms and concepts. We need to find a solution to that problem. I was considering to take a step back and group meanings into one table and concepts in another to try to solve the problem. The issue with that is that it adds a layer of abstraction which might obscure things a bit. (It would become possible to create a 'meaning', where one already existed, thus resulting in two entries for the same 'meaning'). Such things have to be discovered and resolved. I'm sure we will find a way to accomplish that the Wiki way.
One would need quite a few layers of information:
words or lemmas with a certain spelling
language (this is the only one that has to be defined) part of speech many more properties that are optional, such as whether it is normally capitalized, whether it's a conjugated/flexed form, singular/plural, etc.
These words can be grouped to form expressions (and an expression can also be only one word) bathroom could be one entry, salle de bains another This is the 'dictionary layer, where the words/expressions can be described, etymologies can be added etc. This can be done in several languages. The user gets to see the language(s) he wants to see.
The next layer points to these expressions and combines all the ones that have the same meaning.
The way I conceived it, I added yet another layer 'concepts'. Here I grouped the meaning 'mum' and 'mother'. They are the same concept, but the meaning is different, meaning the translations have to be different (maman, mama, mamá), (mère, moeder, madre).
I used some more lookup tables and tables to create many to many relations which I left out of the picture here. The concepts can be illustrated with graphical material, sounds and videos and the pronunciations can be added with something like IPA and point to actual sound files so one can actually listen to them.
I posted the design of all this before, but apparently the fact that I chose to name my tables with a neutral language like Esperanto managed to make it inaccessible to developers.
Polyglot