Dear all,
TL;DR; We are working on an *experimental* Wikidata RDF export based on DBpedia and would like some feedback on our future directions.
Disclaimer: this work is not related or affiliated with the official Wikidata RDF dumps.
Our current approach is to use Wikidata like all other Wikipedia editions and apply our extractors to each Wikidata page (item). This approach generates triples in the DBpedia domain ( http://wikidata.dbpedia.org/resource/). Although this results in duplication, since Wikidata already provides RDF, we made some different design choices and map wikidata data directly into the DBpedia ontology.
sample data: http://nl.dbpedia.org/downloads/wikidatawiki/sample/
experimental dump: http://nl.dbpedia.org/downloads/wikidatawiki/20150207/ (errors see below)
*Wikidata mapping details*
In the same way we use mappings.dbpedia.org to define mappings from Wikipedia templates to the DBpedia ontology, we define transformation mappings from Wikidata properties to RDF triples in the DBpedia ontology.
At the moment we provide two types of Wikidata property mappings:
a) through the mappings wiki in the form of equivalent classes or properties e.g.
property: http://mappings.dbpedia.org/index.php/OntologyProperty:BirthDate
Class: http://mappings.dbpedia.org/index.php/OntologyClass:Person
which will result in the following triples:
wd:Qx a dbo:Person
wd:Qx dbo:birthDate “....”
b) transformation mappings that are (for now) defined in a json file [1]. At the moment we provide the following mappings options:
-
Predefined values -
"P625": {"rdf:type":" http://www.w3.org/2003/01/geo/wgs84_pos#SpatialThing%22%7D will result in: wd:Qx a geo:SpatialThing -
Value formatting with a string containing $1 -
"P214": {"owl:sameAs": "http://viaf.org/viaf/$1%22%7D will result in: wd:Qx owl:sameAs http://viaf.org/viaf/%7BwikidataValue%7D http://viaf.org/viaf/viafID . -
Value formatting with predefined functions. The following are supported for now -
$getDBpediaClass: returns the equivalent DBpedia class for a Wikidata item (using the mappings wiki) -
$getLatitude, $getLongitude & $getGeoRss: geo-related functions
Also note that we can define multiple mappings per property to get the Wikidata data closer to the DBpedia RDF exports e.g.:
"P625": [
{"rdf:type":"http://www.w3.org/2003/01/geo/wgs84_pos#SpatialThing%22%7D,
{"geo:lat":"$getLatitude"},
{"geo:long": "$getLongitude"},
{"georss:point":"$getGeoRss"}],
"P18": [
{"thumbnail":" http://commons.wikimedia.org/wiki/Special:FilePath/$1?width=300%22%7D,
{"foaf:depiction":"http://commons.wikimedia.org/wiki/Special:FilePath/$1%22%7D],
*Qualifiers & reification*
Like Wikidata we provide a simplified dump without qualifiers and a reified dump with qualifiers. However, for the reification we chose simple RDF reification in order to reuse the DBpedia ontology as much as possible. The reified dumps are also mapped using the same configuration.
*Labels, descriptions, aliases and interwiki links*
We additionally defined extractors to get data other than statements. For textual data we split the dumps to the languages that are enabled in the mappings wiki and all the rest. We extract aliases, labels, descriptions, site links. For interwiki links we provide links between Wikidata and DBpedia as well as links between different DBpedia language editions.
*Properties*
We also fully extract wikidata property pages. However, for now we don’t apply any mappings to wikidata properties.
*DBpedia extractors*
Some existing DBpedia extractors also apply in Wikidata that provide versioning and provenance (e.g. pageID, revisionID, etc)
*Help & Feedback*
Although this is a work in progress we wanted to announce it early and get you feedback on the following:
-
Are we going in the right direction? -
Did we overlook something or is something missing? -
Are there any other mapping options we should include? -
Where should we host the advanced json mappings? -
One option is in the mappings wiki, another one is in Wikidata directly or a separate github project
It would be great if you could help us map more data. The easiest way is through the mappings wiki where you can define equivalent classes & properties. See what is missing here: http://mappings.dbpedia.org/server/ontology/wikidata/missing/
You can also provide json configuration but until the code is merged it will not be easy with PRs.
Until the code is merged in the main DBpedia repo you can check it out from here:
https://github.com/alismayilov/extraction-framework/tree/wikidataAllCommits
Notes:
-
we use the Wikidata-Toolkit for reading the json structure which is a great project btw -
The full dump we provide is not complete due to a Wikidata dump export bug. The compressed files are not closed correctly due to this.
Best,
Ali Ismayilov, Dimitris Kontokostas, Sören Auer
[1] https://github.com/alismayilov/extraction-framework/blob/wikidataAllCommits/...