Hi!
Good idea : I’m trying to load the JSON into an Apache
Jena TDB triplestore for later processing and querying. I have a couple of JSON dumps
locally. I chose the JSON format because of the size difference (the BZ2 JSON is 8GB and
the BZ2 TTL beta is 12GB). What would be the best way to do this ? I’d rather use the Java
Wikidata Toolkit instead of the PHP stack if possible.
I do not think JSON export is particularly well-suited for import into a
triple store. It implements the original data model, which needs
significant re-mapping to fit triple store model (e.g. see
https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Data_model
)
You can use one of the following approaches:
1. Use existing RDF dump (yes, it is bigger, because triple store
representation is more verbose by nature)
2. Try to convert manually - e.g. with Java Wikidata Toolkit, using
something like
https://github.com/Wikidata/Wikidata-Toolkit-Examples/blob/master/src/examp…
Note that RDF dump contains a little bit more data than JSON one - IIRC,
page properties
(
https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Page_prope…)
are not there, and there might be other small differences.
--
Stas Malyshev
smalyshev(a)wikimedia.org