Well, Infovore 3.1 happened quickly after Infovore because I made a quick attempt to get my Jena up to date and found it was easy to update, so I did. The importance here is that there is a lot of cool stuff going on with Jena, such as the RDFThrift serialization format, and also some Hadoop I/O tools written by Rob Vesse, and tracking the latest version helps us connect with that. Release page here:
Infovore 3.1 was used to process the Freebase RDF Dump to create a quality-controlled RDF data set called :BaseKB; generally queries look the same on Freebase and :BaseKB, but :BaseKB gives the right answers, faster, and with less memory consumption. This week's release is in the AWS cloud:
s3://basekb-now/2014-11-09-00-00/
something very close to this is going to become :BaseKB Gold 2. This is simpler and better product that the last Gold release from Spring 2014. Here are a few reasons:
* Unicode escape sequences in Freebase are now converted to Unicode characters in RDF
* The rejection rate of triples has dramatically dropped, because of both changes to Infovore and improvements in Freebase content
* The product is now packaged as a set of files partitioned and sorted on subject; this means you can download one file and get a sample of facts about a given topic; there is no longer the "horizontal division"
Between duplicate fact filtering and compression, :BaseKB Now is nearly half the size of the Freebase RDF Dump.