New subject: [Wikidata-l] [Dbpedia-discussion] DBpedia-based RDF dumps for Wikidata

11 Mar 2015

Dear all,

TL;DR; We are working on an *experimental* Wikidata RDF export based on
DBpedia and would like some feedback on our future directions.

Disclaimer: this work is not related or affiliated with the official
Wikidata RDF dumps.

Our current approach is to use Wikidata like all other Wikipedia editions
and apply our extractors to each Wikidata page (item). This approach
generates triples in the DBpedia domain (
http://wikidata.dbpedia.org/resource/). Although this results in
duplication, since Wikidata already provides RDF, we made some different
design choices and map wikidata data directly into the DBpedia ontology.

sample data: http://nl.dbpedia.org/downloads/wikidatawiki/sample/

experimental dump: http://nl.dbpedia.org/downloads/wikidatawiki/20150207/
(errors see below)

*Wikidata mapping details*

In the same way we use mappings.dbpedia.org to define mappings from
Wikipedia templates to the DBpedia ontology, we define transformation
mappings from Wikidata properties to RDF triples in the DBpedia ontology.

At the moment we provide two types of Wikidata property mappings:

a)  through the mappings wiki in the form of equivalent classes or
properties e.g.

property: http://mappings.dbpedia.org/index.php/OntologyProperty:BirthDate

Class: http://mappings.dbpedia.org/index.php/OntologyClass:Person

which will result in the following triples:

wd:Qx a dbo:Person

wd:Qx dbo:birthDate “....”

b) transformation mappings that are (for now) defined in a json file [1].
At the moment we provide the following mappings options:

   -

   Predefined values
   -

      "P625": {"rdf:type":"
      http://www.w3.org/2003/01/geo/wgs84_pos#SpatialThing"}
      will result in: wd:Qx a geo:SpatialThing
      -

   Value formatting with a string containing $1
   -

      "P214": {"owl:sameAs": "http://viaf.org/viaf/$1"}
      will result in:  wd:Qx owl:sameAs http://viaf.org/viaf/{wikidataValue}
      <http://viaf.org/viaf/viafID> .
      -

   Value formatting with predefined functions. The following are supported
   for now
   -

      $getDBpediaClass: returns the equivalent DBpedia class for a Wikidata
      item (using the mappings wiki)
      -

      $getLatitude, $getLongitude & $getGeoRss: geo-related functions

Also note that we can define multiple mappings per property to get the
Wikidata data closer to the DBpedia RDF exports e.g.:

"P625": [

{"rdf:type":"http://www.w3.org/2003/01/geo/wgs84_pos#SpatialThing"},

{"geo:lat":"$getLatitude"},

{"geo:long": "$getLongitude"},

{"georss:point":"$getGeoRss"}],

"P18": [

{"thumbnail":"
http://commons.wikimedia.org/wiki/Special:FilePath/$1?width=300"}uot;},

{"foaf:depiction":"http://commons.wikimedia.org/wiki/Special:FilePath/$1"}],

*Qualifiers & reification*

Like Wikidata we provide a simplified dump without qualifiers and a reified
dump with qualifiers. However, for the reification we chose simple RDF
reification in order to reuse the DBpedia ontology as much as possible. The
reified dumps are also mapped using the same configuration.

*Labels, descriptions, aliases and interwiki links*

We additionally defined extractors to get data other than statements. For
textual data we split the dumps to the languages that are enabled in the
mappings wiki and all the rest. We extract aliases, labels, descriptions,
site links. For interwiki links we provide links between Wikidata and
DBpedia as well as links between different DBpedia language editions.

*Properties*

We also fully extract wikidata property pages. However, for now we don’t
apply any mappings to wikidata properties.

*DBpedia extractors*

Some existing DBpedia extractors also apply in Wikidata that provide
versioning and provenance (e.g. pageID, revisionID, etc)

*Help & Feedback*

Although this is a work in progress we wanted to announce it early and get
you feedback on the following:

   -

   Are we going in the right direction?
   -

   Did we overlook something or is something missing?
   -

   Are there any other mapping options we should include?
   -

   Where should we host the advanced json mappings?
   -

      One option is in the mappings wiki, another one is in Wikidata
      directly or a separate github project

It would be great if you could help us map more data. The easiest way is
through the mappings wiki where you can define equivalent classes &
properties. See what is missing here:
http://mappings.dbpedia.org/server/ontology/wikidata/missing/

You can also provide json configuration but until the code is merged it
will not be easy with PRs.

Until the code is merged in the main DBpedia repo you can check it out from
here:

https://github.com/alismayilov/extraction-framework/tree/wikidataAllCommits

Notes:

   -

   we use the Wikidata-Toolkit for reading the json structure which is a
   great project btw
   -

   The full dump we provide is not complete due to a Wikidata dump export
   bug. The compressed files are not closed correctly due to this.

Best,

Ali Ismayilov, Dimitris Kontokostas, Sören Auer

[1]
https://github.com/alismayilov/extraction-framework/blob/wikidataAllCommits…

-- 
Dimitris Kontokostas
Department of Computer Science, University of Leipzig
Research Group: http://aksw.org
Homepage:http://aksw.org/DimitrisKontokostas

[Wikidata-l] DBpedia-based RDF dumps for Wikidata