The Wikidata Query Service currently holds some 3.8 billion triples – you can see the numbers on Grafana [1]. But WDQS “munges” the dump before importing it – for instance, it merges wdata:… into wd:… and drops `a wikibase:Item` and `a wikibase:Statement` types; see [2] for details – so the triple count in the un-munged dump will be somewhat larger than the triple count in WDQS.
Cheers, Lucas
[1]: https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?panelId=7&... [2]: https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#WDQS_data_d...
On 07.11.2017 17:09, Laura Morales wrote:
How many triples does wikidata have? The old dump from rdfhdt seem to have about 2 billion, which means wikidata doubled the number of triples in less than a year?
Sent: Tuesday, November 07, 2017 at 3:24 PM From: "Jérémie Roquet" jroquet@arkanosis.net To: "Discussion list for the Wikidata project." wikidata@lists.wikimedia.org Subject: Re: [Wikidata] Wikidata HDT dump Hi everyone,
I'm afraid the current implementation of HDT is not ready to handle more than 4 billions triples as it is limited to 32 bit indexes. I've opened an issue upstream: https://github.com/rdfhdt/hdt-cpp/issues/135
Until this is addressed, don't waste your time trying to convert the entire Wikidata to HDT: it can't work.
-- Jérémie
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata%5Bhttps://lists.wikime...]
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata