Hi all,
I wonder if the Wikidata RDF dump (in https://dumps.wikimedia.org/wikidatawiki/entities/) is generated directly from the JSON dump. If this is the case, what tool used?
Best Regards, Daniel
On Sat, Feb 17, 2024 at 4:53 PM daniel@degu.cl wrote:
Hi all,
I wonder if the Wikidata RDF dump (in https://dumps.wikimedia.org/wikidatawiki/entities/) is generated directly from the JSON dump. If this is the case, what tool used?
Best Regards, Daniel
Hi Daniel,
No, they are created independently. I'd be interested in hearing more about what you are trying to do.
Cheers Lydia
Hi Lydia,
No, they are created independently.
Yes somebody also told me these files are created by Wikibase from the database. I also posted this question in (sorry for the cross posting):
https://mstdn.degu.cl/@daniel/111957447287243556
I'd be interested in hearing more about what you are trying to do.
I am trying to understand the Wikibase data model, the serialization models, and these mappings. I have worked with the Wikidata dumps for research, using the JSON and the RDF dumps, and I always have questions.
There is a data model described in [1], whose description has son issues. For example, the Statement class in the UML model has five attributes, three specified in the box (subject, mainSnak, and rank), and two specified with associations (referenceRecords and auxiliarySnaks). The description says that the individual components of a Statement are subject, mainSnak, rank, referenceRecords, and qualifierSnaks. I may be a typo in the UML diagram; it should say qualifierSnaks instead of auxiliarySnaks. Then, the term Rank appears in the grammar definition of class Statement. I guess it should say StatementRank because there is no Rank in the UML model above. I don't know if there is a file with this data model specification that can be processed automatically. In my opinion, it should be one.
The description of the Wikibase data model also says that the Wikibase model, let me call it M-abstract, is implemented with a more efficient database model, let me call it M-db. To my knowledge, the translation from the Wikibase model M-abstract to the serialization models, let me call them M-json and M-rdf, is not formalized with declarative mappings from M-abstract to M-json or M-rdf, but implemented with PHP scripts that separately define maps from M-db to M-json and M-rdf. To me, this is an issue.
[1] https://www.mediawiki.org/wiki/Wikibase/DataModel
Best, Daniel
Hi all,
The Wikidata-Toolkit Java library has been doing that for a while, but I think there has been some changes in the RDF format that have not been reflected in Wikidata-Toolkit yet.
https://github.com/Wikidata/Wikidata-Toolkit/
There is an example Java application taking a JSON dump and translating it to RDF here: https://github.com/Wikidata/Wikidata-Toolkit-Examples/blob/master/src/exampl...
I don't have a good overview of what changed in the RDF serialization since, but it might not be so much work to update WDTK accordingly. At least, on the JSON side of things, it can parse things well.
Best, Antonin
wikidata-tech@lists.wikimedia.org