[Toolserver-l] Experimental: Live template value search

Christopher Sahnwaldt jcsahnwaldt at gmail.com
Tue Apr 21 15:51:12 UTC 2009


> IMHO we should try to harvest the data that is already in
> Wikipedia first.

That's what DBpedia does. Currently, we concentrate on
infoboxes. We extract data from infobox templates and
store the data as RDF. Queries against the data are
usually done with SPARQL, e.g. at http://dbpedia.org/sparql

>> It also does away with problems caused by the
>> various names a parameters with the same meaning may
>> have in different templates (and different wikis).

DBpedia also struggled with these inconsistencies. The
best solution seems to be a hand-made mapping between
template properties and RDF properties. See

http://blog.georgikobilarov.com/2008/10/dbpedia-rethinking-wikipedia-infobox-extraction/

In the future, we would like publish the mapping and let
it grow in a wiki-style.

A few numbers: from the infoboxes, we extracted 7 million
RDF triples with the 'hand-made' approach and 32 million
triples with a generic approach.

Bye,
Christopher

PS: Here's a toy example:
kid actors (younger than 18) in Spielberg movies:

select ?film, ?release, ?actor, ?birth where {
?film <http://dbpedia.org/ontology/director>
<http://dbpedia.org/resource/Steven_Spielberg> .
?film <http://dbpedia.org/ontology/starring> ?actor .
?actor <http://dbpedia.org/ontology/birthdate> ?birth .
?film <http://dbpedia.org/ontology/releaseDate> ?release .
FILTER (bif:datediff('year', ?birth, ?release) < 18)
}



More information about the Toolserver-l mailing list