I'm bringing this up as my proof-by-construction answer to a
knock-down-drag-out thread earlier where people complained about the
difficulty of running queries against DBpedia and Wikidata.
I think some people will find the product described below to be a faster
road to where they are heading in the short term. In the longer term I am
thinking a v4 or v5 infovore may be able to evaluate the contexts of facts
in Wikidata and thus create a world view which can be quality controlled
for particular outcomes.
-----
Well, Infovore 3.1 happened quickly after Infovore because I made a quick
attempt to get my Jena up to date and found it was easy to update, so I
did. The importance here is that there is a lot of cool stuff going on
with Jena, such as the RDFThrift serialization format, and also some
Hadoop I/O tools written by Rob Vesse, and tracking the latest version
helps us connect with that. Release page here:
https://github.com/paulhoule/infovore/releases/tag/v3.1
Infovore 3.1 was used to process the Freebase RDF Dump to create a
quality-controlled RDF data set called :BaseKB; generally queries look
the same on Freebase and :BaseKB, but :BaseKB gives the right answers,
faster, and with less memory consumption. This week's release is in the
AWS cloud:
s3://basekb-now/2014-11-09-00-00/
something very close to this is going to become :BaseKB Gold 2. This is
simpler and better product that the last Gold release from Spring 2014.
Here are a few reasons:
* Unicode escape sequences in Freebase are now converted to Unicode
characters in RDF
* The rejection rate of triples has dramatically dropped, because of both
changes to Infovore and improvements in Freebase content
* The product is now packaged as a set of files partitioned and sorted on
subject; this means you can download one file and get a sample of facts
about a given topic; there is no longer the "horizontal division"
Between duplicate fact filtering and compression, :BaseKB Now is nearly
half the size of the Freebase RDF Dump.
If you're interested please join the mailing list at
https://groups.google.com/forum/#!forum/infovore-basekb