Hi!
Good idea : I’m trying to load the JSON into an Apache Jena TDB triplestore for later processing and querying. I have a couple of JSON dumps locally. I chose the JSON format because of the size difference (the BZ2 JSON is 8GB and the BZ2 TTL beta is 12GB). What would be the best way to do this ? I’d rather use the Java Wikidata Toolkit instead of the PHP stack if possible.
I do not think JSON export is particularly well-suited for import into a triple store. It implements the original data model, which needs significant re-mapping to fit triple store model (e.g. see https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Data_model )
You can use one of the following approaches: 1. Use existing RDF dump (yes, it is bigger, because triple store representation is more verbose by nature)
2. Try to convert manually - e.g. with Java Wikidata Toolkit, using something like https://github.com/Wikidata/Wikidata-Toolkit-Examples/blob/master/src/exampl...
Note that RDF dump contains a little bit more data than JSON one - IIRC, page properties (https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Page_proper...) are not there, and there might be other small differences.