I downloaded some of the "free dictionaries"
on
sourceforge.net, but
they are in an own special format - they are coded and I cannot "see"
the data in clear text format - that's why I myself really did not
concentrate too much on this format.
Well, DICTs consist normally of a gzipped hypertext file together with
an index file that allows finding entries quickly. The freedict project
uses something like the TEI format, which is basically XML.
Nokia is doing stuff in this direction ... I must see
if I can find the
link with the notes - these could be interesting as well. I mean: we
have too many different formats ... that's not good.
Well, Nokia seems to have opted for free software for this new device
that'll be sold later this year.
I believe THE format for dictionary information is TBX
since it is
nothing else than XML with fixed tags and this allows for easy data
exchange.
I don't think there needs to be one format only. :) On Wikimedia servers
it will probably be MySQL, and then people should be able to download
the UW in a bunch of formats. On my mobile phone however, I wouldn't
want to lose too much space on XML tags or uncompressed text.
I did not know you were in there. Maybe it is really
time to talk about
single facts - but maybe first offline. Make data easier available is
part of what we are thinking about. The more we are and the more
constructive are contributions the better we will be and the less effort
will be necessary.
Hmmm... :) as a sidenote: wik2dict can now create Debian packages. It
would be cool if someone could offer server space and bandwidth to put
these up.
Another step is then the data exchangeability with
CAT-Tools like
OmegaT, but that's a story of its own ... if we start to talk about this
here now ... I am just imagining what would happen ;-)
I had never heard of CAT (Computer Aided Translation) or OmegaT (GPL'd
Java software). But I do now. :) Well, just inform the folks from
OmegaT. I guess they should be interested in getting this done.