If you're willing to settle for all Wikidata items with at least *two*
sitelinks (roughly 11.5 million items), it can be done with five simple
WDQS queries (these only return the QIDs though -- no labels):
SELECT?i{VALUES?s{2}?i wikibase:sitelinks?s}
SELECT?i{VALUES?s{3}?i wikibase:sitelinks?s}
(The sitelink counts are implicit for the above two queries and are omitted
from the results to help avoid a timeout or error message.)
SELECT*{VALUES?s{4 7}?i wikibase:sitelinks?s}
SELECT*{VALUES?s{5 6}?i wikibase:sitelinks?s}
SELECT*{VALUES?s{8 9 10 [...] 398 399 400}?i wikibase:sitelinks?s}
(There are a few dozen Wikimedia page-type items that have more than 400
sitelinks; these can be found here:
https://www.wikidata.org/wiki/Wikidata:Database_reports/Most_sitelinked_ite…
.)
Each of these queries ran successfully for me in about 20-30 seconds and I
was able to download the full results as both a TSV and JSON file without
any problems. I had no luck with my attempts to query for the 18.4 million
items with only one sitelink, even when using LIMIT and OFFSET.
Hope that helps,
Tyler
On Tue, Mar 22, 2022 at 5:25 PM <finin(a)umbc.edu> wrote:
Is there a simple way to get the sitelinks count data
for all Wikidata
items? I want to use the data to help rank possible text entity links to
Wikidata items
I'm really only interested in counts for items that have at least one
(e.g., wikibase:sitelinks value that's >0). According to statistics I've
seen, only about 1/3 of Wikidata items have at least one sitelink.
I'm not sure if wikibase:sitelinks is included in the standard WIkidata
dump. I could try a SPARQL query with an OFFSET and LIMIT, but I doubt
that the approach would work to completion.
_______________________________________________
Wikidata mailing list -- wikidata(a)lists.wikimedia.org
To unsubscribe send an email to wikidata-leave(a)lists.wikimedia.org