Dear Lars,
On 05/24/2012 03:54 AM, Lars Aronsson wrote:
On 2012-05-23 19:10, Christoph Lauer wrote:
The template I wrote was for the english
wiktionary. I'm not sure what
you mean by source format; the entry layouts follow the XML standard as
described here:
http://wiktionary.dbpedia.org/ (just to make sure we're
not talking cross purposes ;-) ).
It is indeed confusing that the DBpedia webpage you link to
points to this mailing list. It would be really helpful if Jonas Brekle
would edit that page to include an introduction on what Wiktionary
is (
www.wiktionary.org and associated wiki sites in many languages,
a project of the Wikimedia Foundation), and explain that his
DBpedia project (
wiktionary.dbpedia.org) is something else.
I added a line to the
description:
http://wiki.dbpedia.org/Wiktionary
Some time ago, there was a discussion about the purpose of this list. I
know, that a large part of the editors in Wiktionary are only concerned
with the human readable appearance. But there are quite a few other user
groups that are interested in extracting and using the data.
So basically we need a place to coordinate data extraction, templates
and data consumption somewhere.
I think this list is ideal to get everybody, together, i.e. Christian
from JWKTL, Andrew from Wikokit and Jonas and I from DBpedia Wiktionary
are on here.
Then we have people like Amgine who develop apps and would benefit from
more structure and also (now that we focused on this list) new people
like Christoph, who are interested in getting data out of Wiktionary.
Personally, it is my opinion, that a lot more people would contribute
additional data and effort to Wiktionary, if they were able to get it
out again.
I firmly believe that it would be a set back to a lot of people, if we
started to divide the communities again. The interest in Wiktionary data
is immense and vast resources are burnt, just because hundreds of people
and companies are building parsers on their own (I know a company
employing 2 students 20h/week, just for the parsing).
Jonas and I are trying to make Wiktionary-DBpedia the center of
http://linguistics.okfn.org/resources/llod/ just as DBpedia is the
center of this:
http://lod-cloud.net/
The data can also be used to fix things in Wiktionary.
Do you have a special problem that causes a lot of distress and work
amongst the editors? e.g. translation link consitency or
update/maintainence procedures?
We could try to create apps that help editors, but we would need a
problem description.
All the best,
Sebastian
The formats delivered by Wiktionary are the live wiki sites and the
XML database dumps that you get from
http://dumps.wikimedia.org/backup-index.html
Somebody (Jonas?) at DBpedia probably uses the XML dump (?)
and transforms that into something that is your source format.
I'm not familiar with that transform. I only know Wiktionary.
Wiktionary, like any wiki, is created by many individuals for the
instant reward of seeing the result. The sometimes inconsistent
use of different wiki templates does not matter, as long as we
only care for the human-readable HTML that the wiki shows.
For example, instead of the line
# {{sv-adj-form-abs-indef-n|ovedersäglig}}
I could have written in plain wiki text
# ''absolute indefinite neuter form of''
'''[[ovedersäglig]]'''
[[Category:Swedish adjective forms]]
which produces exactly the same HTML output, even though
it would be near impossible to parse for DBpedia.
If you (Jonas) want to extract useful structured data, you need to
show that result to the people who edit the wiki, so they can
understand where they used the wrong wiki templates or formats.
If you parse the XML dumps and find ==Swedish== without any of
the proper Swedish form-of templates or declension/conjugation
templates, something is probably wrong, and needs fixing.
Interesting that the english subcategoy is
practically empty whereas the
http://wiktionary.dbpedia.org/page/took. There's no reference to the
base form, so I would like to add it. Thats what it's all about ;-)
The English Wiktionary's entry "took" contains the line
# {{simple past of|take}}
where lang=en is the default parameter.
--
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Projects:
http://nlp2rdf.org ,
http://dbpedia.org
Homepage:
http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group:
http://aksw.org