If you're willing to settle for all Wikidata items with at least two sitelinks (roughly 11.5 million items), it can be done with five simple WDQS queries (these only return the QIDs though -- no labels):

SELECT?i{VALUES?s{2}?i wikibase:sitelinks?s}

SELECT?i{VALUES?s{3}?i wikibase:sitelinks?s}

(The sitelink counts are implicit for the above two queries and are omitted from the results to help avoid a timeout or error message.) 

SELECT*{VALUES?s{4 7}?i wikibase:sitelinks?s}

SELECT*{VALUES?s{5 6}?i wikibase:sitelinks?s}

SELECT*{VALUES?s{8 9 10 [...] 398 399 400}?i wikibase:sitelinks?s}

(There are a few dozen Wikimedia page-type items that have more than 400 sitelinks; these can be found here: https://www.wikidata.org/wiki/Wikidata:Database_reports/Most_sitelinked_items.)

Each of these queries ran successfully for me in about 20-30 seconds and I was able to download the full results as both a TSV and JSON file without any problems.  I had no luck with my attempts to query for the 18.4 million items with only one sitelink, even when using LIMIT and OFFSET.

Hope that helps,


On Tue, Mar 22, 2022 at 5:25 PM <finin@umbc.edu> wrote:
Is there a simple way to get the sitelinks count data for all Wikidata items?  I want to use the data to help rank possible text entity links to Wikidata items

I'm really only interested in counts for items that have at least one (e.g., wikibase:sitelinks value that's >0).  According to statistics I've seen, only about 1/3 of Wikidata items have at least one sitelink.

I'm not sure if wikibase:sitelinks is included in the standard WIkidata dump.  I could try a SPARQL query with an OFFSET and LIMIT, but I doubt that the approach would work to completion.
Wikidata mailing list -- wikidata@lists.wikimedia.org
To unsubscribe send an email to wikidata-leave@lists.wikimedia.org