On 2012-05-23 17:36, Christoph Lauer wrote:
I'm working on the Entry Layouts, which extract wiktionary data into the dbpedia framework. The first thing I'm interested in the link to the base form of an inflected verb/adjective.
Which language of Wiktionary, and what is your source format? In the English Wiktionary, the category tree under http://en.wiktionary.org/wiki/Category:Form-of_templates_by_language will guide you to wiki templates used to express that an entry is an inflected form of a base word.
For example, two levels down, you will find http://en.wiktionary.org/wiki/Category:Swedish_form-of_templates and http://en.wiktionary.org/wiki/Template:sv-adj-form-abs-indef-n which is used in the entry http://en.wiktionary.org/wiki/oveders%C3%A4gligt to specify that this word is a form of a Swedish adjective. The base word is the first and only parameter.
In the statistics for the English Wiktionary, http://en.wiktionary.org/wiki/Wiktionary:Statistics you can see how many entries are "form-of definitions" for each language, i.e. that there are 79,966 Swedish form-of definitions in the English Wiktionary.
For the analysis, you may want to consult the person who updates that statistics page, Conrad Irving, http://en.wiktionary.org/wiki/User:Conrad.Irwin
On average, each Swedish/Danish/Norwegian base word has 4-5 form variants, which is higher than Dutch and German, but lower than Finnish.
Within the English Wiktionary, each language has its own small subcommunity that might organize things a little different. For example, there is no category for Danish form-of templates. I don't know why. And the Dutch have only one template for adjective forms, using a parameter to say which form it is. While the Swedish use one template for each form, using only one parameter.