Rodolfo Raya wrote:
On 7/14/05, Gerard Meijssen
I have started anew on an ERD for the UW.
Have you considered designing an XCS template to use with TBX exports?
You may want to take a look at the default template included in TBX
specs and remove the fields that you consider irrelevant. Once you
define the template, it would be easier to identify the requirements
of the application and write an UML or ERD diagram.
The first thing I need to do is to make sure that we can host the data
in the database. To do that there are several requirements that I have
to allow for;
* The user interface must be in the language indicated by the user.
* Ultimate Wiktionary is to use its own dog food or, if some
terminology required in the UI is not there, it must be possible to
add it within Ultimate Wiktionary
* It must be possible to host all words of all languages in this
* We need cooperation of many people to make it all possible so I need
to acknowledge glossaries and thesauri that we are given to host
* Users must be able to select the languages that they want to see.
The budget that we have to create UW is minimal. We will be happy if
we can get it to work and host the data.within a limited amount of
time.We have considered TBX. There are however TWO crucial things
before we can consider exporting to TBX, the first is IMPORT, the
second one is analysis of what the export should be for. If the Dutch
content will be 222.000+ words to start of with and everyone starts
hitting our servers because you can, it makes for some optimalisation.
If the export is based on a need of only the changed content, it is
different again. When the need is associated with the projected
reference implementation, it makes sense to consider how we will ask
translators to aid in the content of the UW.
Sabine for instance already adds content to the Italian wiktionary
based on her translations. When we ask it as part of the reference
implementation for a translation glossary, we want to encourage
translators to work on the content like Sabine does, finding the right
way is crucial. When we start with UW, it may be important to have
some Quality Assurance measures built in. This is also very much
intrinsic to the database design and that is what is currently being
worked on. Currently there are some 17 tables that make up the ERD,
there is need for some more.
Over the last year I have had many conversations about what should be
in an Ultimate Wiktionary. Now I am trying to integrate it all. I have
posted the design to the uw-creations(a)googlegroups.com and I have
posted it at http://commons.wikimedia.org/wiki/Image:ERD.jpg
should understand it is a working document and if you have questions
of suggestions do not hesitate to discuss them with me and with others.
They can say what they want about Dutch people, but they can't say you
aren't persistent! I'm looking at the ERD. The first remark I have is
that English people won't understand what wordtype means. The term they
use is 'part of speech' or POS.
Keep up the good work. I'll give you more remarks later, when I
encounter something to say.