Hi Marco,
I guess this depends what you mean by "exhaustive". Exhaustive in that every Wikidata item has ID X, or exhaustive in that we have every instance of ID X in Wikidata?
The first is probably not going to happen, as the vast majority of external identifiers have a defined scope for what they identify. Some are pretty broad - VIAF is essentially "everyone who exists in a library catalogue as an author or subject" - but still have a limit. We're never really going to reach a situation where there is a single identifier type that covers everyone, unless we're linking across to another Wikidata-type comprehensive knowledgebase, and even then we'd need to ensure we're in a position where they already cover everything in Wikidata.
The second can (and has) been done - the largest one I know of offhand for people is the Oxford DNB (60k items) but for non-people we have complete coverage of eg Swedish district codes, P1841 (160k items). It's a bit of a slog to get these completed and then maintained, since the last 5-10% tend to be more challenging complicated cases, but one or two determined people can make it happen. And of course it's not appropriate for many identifiers, as they may issue IDs for things that we don't intend to have in Wikidata, so we will never completely cover them.
I should quickly plug the "expected completeness" property which is really useful for identifiers - P2429 - as this can quickly show whether something is a) completely on Wikidata; b) not complete yet but eventually might be; or c) probably never will be. Not very widely rolled out yet, though...
Andrew.
On 7 September 2017 at 19:51, Marco Fossati fossati@spaziodati.eu wrote:
Hi everyone,
As a data quality addict, I've been investigating the coverage of external identifiers linked to Wikidata items about people.
Given the numbers on SQID [1] and some SPARQL queries [2, 3], it seems that even the second most used ID (VIAF) only covers *25%* of people items circa. Then, there is a long tail of IDs that are barely used at all.
So here is my question: *which external identifiers deserve an effort to achieve exhaustive coverage?*
Looking forward to your valuable feedback. Cheers,
Marco
[1] https://tools.wmflabs.org/sqid/#/browse?type=properties "Select datatype" set to "ExternalId", "Used for class" set to "human Q5" [2] total people: http://tinyurl.com/ybvcm5uw [3] people with a VIAF link: http://tinyurl.com/ya6dnpr7
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata