Jo wrote:
Gerard Meijssen wrote:
Ray Saintonge wrote:
The current format is adequate. Your proposal makes no mention of the possible sacrifices in terms of ease of editing, a key feature in all the wikis. How will flexibility of format be maintained?
XML is not to edited by hand. You are absolutely right about that. However, the current format is not without its problems. At this moment an English word cannot be re-used easily in other languages. Things are free formatted at the moment. It would be a good thing if we start thinking about creating some database structures for use within wiktionary. It would rid us of these dratted templates like {{en}} and {{-en-}}. They work, it is the best thing around but they are ugly.
What I propose at this time is to get us thinking about importing and exporting in an XML format. And considering changes to enhance the functionality within all wiktionaries and the functionality to the outside world.
One of the aims of wikimedia is to create open content. By having our data in our proprietary format, we do not achieve what can be achieved.
Being able to extract the data from the English Wiktionary is something that has always been on my mind. It is one of the considerations. Ease of editing is another. That's why the entries in Wiktionary may not look very pretty, but they are built up in a very logical way. It is possible, although probably not trivial, to process them with a script.
If XML is not to be edited by hand, then how are we going to edit it? Is it compatible with the Wiki concept?
If you want database structures, I can give you those. I have been designing a relational database capable of storing everything that is relevant for a dictionary. The problem is in the user interface. That's where I'm stuck. Another problem is how to keep a history of what was changed. Another thing is that I don't have any idea how performant my database would be. Building a presentable report for an entry involves solving relations between many, many tables.
Anyway, if you're interested in having a look at it, I will gladly send you the OpenOffice.org drawing with the table structures.
Polyglot
Wiktionary-l mailing list Wiktionary-l@Wikipedia.org http://mail.wikipedia.org/mailman/listinfo/wiktionary-l
XML will be the result of an extraction process. In the same way, XML will/may result in entries into the Wiktionary database.
The problem with the current "structure" is that it is too fluent and is prone to produce errors. Particularly problematic are the translations with numbers indicating that may or may not exist indicating to what meaning of a word they refer. It will be almost impossible to export to XML because of this.
I am really happy that your database design is on META so that people can comment on it. (http://meta.wikimedia.org/wiki/Tables_for_Wiktionary). As discussed with Polyglot using SKYPE, I would prefer to have less tables. However, it is really usefull to have thought out designs and as such it is hopefull for the things that may come.
One thing that XML should be used for is to create a history that people can subscribe to. This will export the Wiktionary content and will make it more relevant to many translators as it is then a matter of reading it into whatever format. When the changes are entered after 24 hours of the last change, things like vandalism have less chance.
Some excellent touches in the db design are to include pictures and sounds in it. Have SAMPA, but nothing beats hearing a native speaker saying a word, a phrase. A description of a monkey sure, but a picture paints a thousant words. One thing that can be added is somehing on etymology.
Thanks, Gerard