Brian Suda wrote:
I have been on this list awhile, when i originally
joined i was
interesting in the possibility of exporting the wiktionary data as
.dict format. Now that the newest version of OSX 10.4 has a built-in
dictionary that uses the dict:// to look-up words i was interested to
see if anyone on the technicaly side would like to explore the
possibility of either exporting the Wiktionary database as .dict
format, or run a dictionary daemon that would access the wiktionary
database server and return dict entries. It would be read-only, but it
would be another interesting way to access the wiktionary besides the
web interface.
Does anyone on the tech list know if this is even possible? I'm not
asking you to do it (i can write the export), i was wondering if there
is some sort of database schema available to extract the data into
dict format, or are the entries too fragmented to even attempt an
export?
Hi Brian,
I went over and read the dict RFC document. It only explains the
protocol to talk with a dict server. It says nothing about what the
contents should look like. Pure html would be OK and it would be
possible to indicate that it is html by indicating the mime type.
So, this answers your question in a positive way. It would be almost
trivial to provide the contents of the various wiktionaries through the
dict protocol. All of Wiktionary is stored in MySQL databases. One for
each project. It is possible to download dump files of these and load
them in your own local MySQL server or it would be possible to set up a
dict server that relays requests immediately to the Wikimedia servers.
It is possible to retrieve the entries through xml rather than html
(this means there is less overhead for the Wikimedia servers). For a way
how to to that I suggest you have a look at PyWikipediaBot. They have
module that does exacly that (and more, but that's not important for a
read-only dict server)
Good luck and I hope you feel like implementing this.
Polyglot