Dear Wikidata Community,

 

We are trying to understand which database technologies and strategies Wikidata uses for storing, updating, and querying the data (knowledge) it manipulates.

 

By looking at the documentation we understood that RDF is only used for the Wikidata Query Service, but we could not find out exactly how Wikidata stores the information that is translated to RDF during the data dump.

 

More specifically, we understood that a MySQL (or is it MariaDB?) relational database is used as the key persistence component for most of Wikidata services and that the information that is maintained in this database is periodically exported to multiple formats, including RDF.

 

In addition, looking at the relational database schema published in the documentation we could not locate tables that are easily mappable to the Wikibase Data Model.

Thus, we hypothesize that there is some software component (Wikibase Common Data Access?) that dynamically translates the data contained in those tables to Statements, Entities, etc. Is that hypothesis, correct?

If yes, does this software component uses any intermediate storage mechanism for caching those Statements, Entities, ...? Or are those translations always performed at runtime on-the-fly (be it for querying, adding, or updating Statements, Entities, …)?

 

Finally, we would like to understand more about how Wikidata REST API is implemented:

 

If there is an updated documentation that could help us answer those questions, could you kindly point us to it? Otherwise, would you be able to share this information with us?

 

Best Regards,

Elton F. de S. Soares

Advisory Software Engineer

Rio de Janeiro, RJ, Brazil

IBM Research

E-mail: eltons@ibm.com