Hello,
Can someone point out the code that is used to generate the « beta » .ttl files from the .json dumps, if any ?
Also, is there a roadmap or whishlist for it to be considered stable and not « beta » anymore ?
Thanks,
Vincent
To what are you referring to when you say "beta"?
The code you are most probably looking for is in https://phabricator.wikimedia.org/diffusion/EWBA/browse/master/repo/includes...
Best Thiemo
I’m referring to the BETA in all caps in the .ttl dump files :
https://dumps.wikimedia.org/wikidatawiki/entities/20170626/ https://dumps.wikimedia.org/wikidatawiki/entities/20170626/
Can these .ttl files be generated from the JSON files in the same directory using the PHP scripts you mentioned ?
Regards,
Vincent
Le 3 juil. 2017 à 10:39, Thiemo Mättig thiemo.maettig@wikimedia.de a écrit :
To what are you referring to when you say "beta"?
The code you are most probably looking for is in https://phabricator.wikimedia.org/diffusion/EWBA/browse/master/repo/includes...
Best Thiemo
Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
The RDF dumps are marked as BETA because we are still in a phase where we need to apply breaking changes in quick iterations. We can not do this any more if they are not beta any more.
Even if it is technically possible to import a .json dump and turn it into an RDF tripples dump, this is not what we do. We are using the dumpRdf.php maintenance script to create the RDF dump, which is using the code I pointed you to.
Maybe instead of asking if a specific solution exists, you can start with explaining your problem, what you have, and what you would like to achieve? I believe this opens more possibilities for people to answer and help you.
Best Thiemo
Le 3 juil. 2017 à 11:17, Thiemo Mättig thiemo.maettig@wikimedia.de a écrit :
Maybe instead of asking if a specific solution exists, you can start with explaining your problem, what you have, and what you would like to achieve? I believe this opens more possibilities for people to answer and help you.
Good idea : I’m trying to load the JSON into an Apache Jena TDB triplestore for later processing and querying. I have a couple of JSON dumps locally. I chose the JSON format because of the size difference (the BZ2 JSON is 8GB and the BZ2 TTL beta is 12GB). What would be the best way to do this ? I’d rather use the Java Wikidata Toolkit instead of the PHP stack if possible.
Regards,
Vincent
Hi!
Good idea : I’m trying to load the JSON into an Apache Jena TDB triplestore for later processing and querying. I have a couple of JSON dumps locally. I chose the JSON format because of the size difference (the BZ2 JSON is 8GB and the BZ2 TTL beta is 12GB). What would be the best way to do this ? I’d rather use the Java Wikidata Toolkit instead of the PHP stack if possible.
I do not think JSON export is particularly well-suited for import into a triple store. It implements the original data model, which needs significant re-mapping to fit triple store model (e.g. see https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Data_model )
You can use one of the following approaches: 1. Use existing RDF dump (yes, it is bigger, because triple store representation is more verbose by nature)
2. Try to convert manually - e.g. with Java Wikidata Toolkit, using something like https://github.com/Wikidata/Wikidata-Toolkit-Examples/blob/master/src/exampl...
Note that RDF dump contains a little bit more data than JSON one - IIRC, page properties (https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Page_proper...) are not there, and there might be other small differences.
wikidata-tech@lists.wikimedia.org