I am very happy to read the wikidata team is working on the much anticipated problems with WDQS [5] toward making WikiData scalable.
On my side the biggest technical problem, that I know about, is the on-disk footprint.
Last year, I managed to have 1 to 1 ratio between the .nt textual format and the stored data while keeping around the added advantage of time traveling queries.
Since then I figured further optimizations that should bring the on-disk footprint to something that is similar to current blazegraph production setup.
That is around 1/3 or 1/2 of the.nt dump size, that is something around 2TB or 3TB SSD disk to store the current wikidata while keeping around the added advantage of scaling both queries and storage horizontally, possibly infinitely!
Best regards,
[0] https://www.youtube.com/watch?v=oV4qelj9fxM via https://etherpad.wikimedia.org/p/WikidataCon2021-ScalingWDQS [1] https://www.wikidata.org/wiki/Wikidata:Query_Service_scaling_update_Aug_2021 [2] https://phabricator.wikimedia.org/T291207 [3] https://phabricator.wikimedia.org/T206560 [4] https://phabricator.wikimedia.org/T291340 [5] https://meta.wikimedia.org/wiki/Grants:Project/Future-proof_WDQS
Amirouche Amazigh BOUBEKKI ~ https://hyper.dev