On 2012-05-23 17:36, Christoph Lauer wrote:
I'm working on the Entry Layouts, which extract
wiktionary data into the
dbpedia framework. The first thing I'm interested in the link to the
base form of an inflected verb/adjective.
Which language of Wiktionary, and what is your source format?
In the English Wiktionary, the category tree under
http://en.wiktionary.org/wiki/Category:Form-of_templates_by_language
will guide you to wiki templates used to express that
an entry is an inflected form of a base word.
For example, two levels down, you will find
http://en.wiktionary.org/wiki/Category:Swedish_form-of_templates
and
http://en.wiktionary.org/wiki/Template:sv-adj-form-abs-indef-n
which is used in the entry
http://en.wiktionary.org/wiki/oveders%C3%A4gligt
to specify that this word is a form of a Swedish adjective.
The base word is the first and only parameter.
In the statistics for the English Wiktionary,
http://en.wiktionary.org/wiki/Wiktionary:Statistics
you can see how many entries are "form-of definitions"
for each language, i.e. that there are 79,966 Swedish
form-of definitions in the English Wiktionary.
For the analysis, you may want to consult the person who
updates that statistics page, Conrad Irving,
http://en.wiktionary.org/wiki/User:Conrad.Irwin
On average, each Swedish/Danish/Norwegian base word
has 4-5 form variants, which is higher than Dutch and
German, but lower than Finnish.
Within the English Wiktionary, each language has its own
small subcommunity that might organize things a little
different. For example, there is no category for
Danish form-of templates. I don't know why. And the Dutch
have only one template for adjective forms, using a
parameter to say which form it is. While the Swedish
use one template for each form, using only one parameter.
--
Lars Aronsson (lars(a)aronsson.se)
Aronsson Datateknik -
http://aronsson.se