Markus,

Thank you very much for this.  Translating Wikidata into the language of the Semantic Web is important.  Being able to explore the Wikidata taxonomy [1] by doing SPARQL queries in Protege [2] (even primitive queries) is really neat, e.g.

SELECT ?subject
WHERE

   ?subject rdfs:subClassOf <http://www.wikidata.org/entity/Q82586> .
}

This is more of an issue of my ignorance of Protege, but I notice that the above query returns only the direct subclasses of Q82586.  The full set of subclasses for Q82586 ("lepton") is visible at http://tools.wmflabs.org/wikidata-todo/tree.html?q=Q82586&rp=279&lang=en -- a few of the 2nd-level subclasses (muon neutrino, tau neutrino, electron neutrino) are shown there but not returned by that SPARQL query.  It seems rdfs:subClassOf isn't being treated as a transitive property in Protege.  Any ideas?

Do you know when the taxonomy data in OWL will have labels available? 

Also, regarding the complete dumps, would it be possible to export a smaller subset of the faithful data?  The files under "Complete Data Dumps" in http://tools.wmflabs.org/wikidata-exports/rdf/exports/20140526/ look too big to load into Protege on most personal computers, and would likely require adjusting JVM settings on higher-end computers to load.  If it's feasible to somehow prune those files -- and maybe even combine them into one file that could be easily loaded into Protege -- that would be especially nice.

Thanks,
Eric
https://www.wikidata.org/wiki/User:Emw

1. http://tools.wmflabs.org/wikidata-exports/rdf/exports/20140526/wikidata-taxonomy.nt.gz
2. http://protege.stanford.edu/





On Tue, Jun 10, 2014 at 4:43 AM, Markus Kroetzsch <markus.kroetzsch@tu-dresden.de> wrote:
Hi all,

We are now offering regular RDF dumps for the content of Wikidata:

http://tools.wmflabs.org/wikidata-exports/rdf/

RDF is the Resource Description Framework of the W3C that can be used to exchange data on the Web. The Wikidata RDF exports consist of several files that contain different parts and views of the data, and which can be used independently. Details on the available exports and the RDF encoding used in each can be found in the paper "Introducing Wikidata to the Linked Data Web" [1].

The available RDF exports can be found in the directory http://tools.wmflabs.org/wikidata-exports/rdf/exports/. New exports are generated regularly from current data dumps of Wikidata and will appear in this directory shortly afterwards.

All dump files have been generated using Wikidata Toolkit [2]. There are some important differences in comparison to earlier dumps:

* Data is split into several dump files for convenience. Pick whatever you are most interested in.
* All dumps are generated using the OpenRDF library for Java (better quality than ad hoc serialization; much slower too ;-)
* All dumps are in N3 format, the simplest RDF serialization format that there is
* In addition to the faithful dumps, some simplified dumps are also available (one statement = one triple; no qualifiers and references).
* Links to external data sets are added to the data for Wikidata properties that point to datasets with RDF exports. That's the "Linked" in "Linked Open Data".

Suggestions for improvements and contributions on github are welcome.

Cheers,

Markus

[1] http://korrekt.org/page/Introducing_Wikidata_to_the_Linked_Data_Web
[2] https://www.mediawiki.org/wiki/Wikidata_Toolkit

--
Markus Kroetzsch
Faculty of Computer Science
Technische Universität Dresden
+49 351 463 38486
http://korrekt.org/

_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l