Jimmy Wales wrote:
Ray Saintonge wrote:
Not really, since the existing wiktionaries would
continue as they have
all along.
I am not agreeing or disagreeing or taking any position on this at all.
Please consider my questions about this to be merely data gathering. :-)
Why should we have such a formal break? Why shouldn't we instead
transition from one to the other in a seamless way? I worry a lot about
duplication of effort, etc.
Are there objections from old-school wiktionarians to the ultimate
wiktionary plans? Is it possible that we could address these objections
through code?
Hi Jimbo et al,
What GerardM proposes as the ultimate Wiktionary is what I have been
considering to do all along: a dictionary in a relational database. It's
a pity I can't seem to find more time to contribute to its genesis. What
it would mean is basically that we would all be working on the same set
of data, instead of having each language on its balkanized Wiktionary .
People who speak French get a French interface to the data. People who
speak Swahili get an interface in swahili. Part of the interface could
be to show that content exists in other languages for the description of
a word, for an etymology, etc.
It is a pity that there is an enormous duplication of effort and
possibility of errors. An error may get corrected in one Wiktionary, but
it isn't in the other Wiktionaries. The UW wants to solve these
problems. Also if somebody in the Dutch Wiktionary knows the translation
of a word to Frisian, this knowledge doesn't propagate to the other
language Wiktionaries. Not in an efficient way anyway. With the UW if
somebody adds a translation, this translation becomes available to
everybody.
Of course, new problems will crop up. For instance, translations are not
transitive for all but the simplest of terms and concepts. We need to
find a solution to that problem. I was considering to take a step back
and group meanings into one table and concepts in another to try to
solve the problem. The issue with that is that it adds a layer of
abstraction which might obscure things a bit. (It would become possible
to create a 'meaning', where one already existed, thus resulting in two
entries for the same 'meaning'). Such things have to be discovered and
resolved. I'm sure we will find a way to accomplish that the Wiki way.
One would need quite a few layers of information:
words or lemmas with a certain spelling
language (this is the only one that has to be defined)
part of speech
many more properties that are optional, such as whether it is
normally capitalized, whether it's a conjugated/flexed form,
singular/plural, etc.
These words can be grouped to form expressions (and an expression can
also be only one word)
bathroom could be one entry, salle de bains another
This is the 'dictionary layer, where the words/expressions can be
described, etymologies can be added etc. This can be done in several
languages. The user gets to see the language(s) he wants to see.
The next layer points to these expressions and combines all the ones
that have the same meaning.
The way I conceived it, I added yet another layer 'concepts'. Here I
grouped the meaning 'mum' and 'mother'. They are the same concept, but
the meaning is different, meaning the translations have to be different
(maman, mama, mamá), (mère, moeder, madre).
I used some more lookup tables and tables to create many to many
relations which I left out of the picture here. The concepts can be
illustrated with graphical material, sounds and videos and the
pronunciations can be added with something like IPA and point to actual
sound files so one can actually listen to them.
I posted the design of all this before, but apparently the fact that I
chose to name my tables with a neutral language like Esperanto managed
to make it inaccessible to developers.
Polyglot