I'm bringing this up as my proof-by-construction answer to a knock-down-drag-out thread earlier where people complained about the difficulty of running queries against DBpedia and Wikidata.
I think some people will find the product described below to be a faster road to where they are heading in the short term. In the longer term I am thinking a v4 or v5 infovore may be able to evaluate the contexts of facts in Wikidata and thus create a world view which can be quality controlled for particular outcomes.
-----
Well, Infovore 3.1 happened quickly after Infovore because I made a quick attempt to get my Jena up to date and found it was easy to update, so I did. The importance here is that there is a lot of cool stuff going on with Jena, such as the RDFThrift serialization format, and also some Hadoop I/O tools written by Rob Vesse, and tracking the latest version helps us connect with that. Release page here:
https://github.com/paulhoule/infovore/releases/tag/v3.1
Infovore 3.1 was used to process the Freebase RDF Dump to create a quality-controlled RDF data set called :BaseKB; generally queries look the same on Freebase and :BaseKB, but :BaseKB gives the right answers, faster, and with less memory consumption. This week's release is in the AWS cloud:
s3://basekb-now/2014-11-09-00-00/
something very close to this is going to become :BaseKB Gold 2. This is simpler and better product that the last Gold release from Spring 2014. Here are a few reasons:
* Unicode escape sequences in Freebase are now converted to Unicode characters in RDF * The rejection rate of triples has dramatically dropped, because of both changes to Infovore and improvements in Freebase content * The product is now packaged as a set of files partitioned and sorted on subject; this means you can download one file and get a sample of facts about a given topic; there is no longer the "horizontal division"
Between duplicate fact filtering and compression, :BaseKB Now is nearly half the size of the Freebase RDF Dump.
If you're interested please join the mailing list at