Somewhat related to this discussion is the coli-conc project, which
collects statistics about KOS-type (thesaurus, authority file etc.)
identifier links in Wikidata:
You can also find statistics about indirect mappings, from one KOS via
Wikidata to another KOS.
-Osma
Magnus Manske kirjoitti 08.09.2017 klo 13:01:
Is anyone working on an "auto-resolve" bot?
If you have VIAF (but
nothing else), you can resolve other identifiers via the VIAF site;
similarly, if you have only GND, you could try to reverse-lookup VIAF.
I think a list of items that have zero external identifiers, ordered by
"importance" (incoming wikidata links, number of statements etc) would
also be helpful.
On Fri, Sep 8, 2017 at 10:52 AM Jane Darnell <jane023(a)gmail.com
<mailto:jane023@gmail.com>> wrote:
As a basic rule for "which external identifiers are worth covering",
I would begin with any national identifiers we have for people
(politicians, artists, writers, theologians, scientists, etc), then
national identifiers for organizations (government-related,
GNP-related businesses, nonprofits, educational institutions, etc),
then national identifiers for places (census-defined population
centers, battle-scenes, etc)
In my opnion, the question should not be "which identifier has the
most coverage" but "which items have the most identifiers"
On Thu, Sep 7, 2017 at 9:26 PM, Andrew Gray
<andrew(a)generalist.org.uk <mailto:andrew@generalist.org.uk>> wrote:
Hi Marco,
I guess this depends what you mean by "exhaustive". Exhaustive
in that
every Wikidata item has ID X, or exhaustive in that we have every
instance of ID X in Wikidata?
The first is probably not going to happen, as the vast majority of
external identifiers have a defined scope for what they
identify. Some
are pretty broad - VIAF is essentially "everyone who exists in a
library catalogue as an author or subject" - but still have a limit.
We're never really going to reach a situation where there is a
single
identifier type that covers everyone, unless we're linking across to
another Wikidata-type comprehensive knowledgebase, and even then
we'd
need to ensure we're in a position where they already cover
everything
in Wikidata.
The second can (and has) been done - the largest one I know of
offhand
for people is the Oxford DNB (60k items) but for non-people we have
complete coverage of eg Swedish district codes, P1841 (160k items).
It's a bit of a slog to get these completed and then maintained,
since
the last 5-10% tend to be more challenging complicated cases,
but one
or two determined people can make it happen. And of course it's not
appropriate for many identifiers, as they may issue IDs for things
that we don't intend to have in Wikidata, so we will never
completely
cover them.
I should quickly plug the "expected completeness" property which is
really useful for identifiers - P2429 - as this can quickly show
whether something is a) completely on Wikidata; b) not complete yet
but eventually might be; or c) probably never will be. Not very
widely
rolled out yet, though...
Andrew.
On 7 September 2017 at 19:51, Marco Fossati
<fossati(a)spaziodati.eu <mailto:fossati@spaziodati.eu>> wrote:
Hi everyone,
As a data quality addict, I've been investigating the
coverage of
external
identifiers linked to Wikidata items about
people.
Given the numbers on SQID [1] and some SPARQL queries [2, 3],
it seems
that
even the second most used ID (VIAF) only covers
*25%* of
people items circa.
Then, there is a long tail of IDs that are barely
used at all.
So here is my question:
*which external identifiers deserve an effort to achieve
exhaustive
coverage?*
Looking forward to your valuable feedback.
Cheers,
Marco
[1]
https://tools.wmflabs.org/sqid/#/browse?type=properties "Select
datatype" set to "ExternalId",
"Used for class" set to "human Q5"
[2] total people:
http://tinyurl.com/ybvcm5uw
[3] people with a VIAF link:
http://tinyurl.com/ya6dnpr7
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
<mailto:Wikidata@lists.wikimedia.org>
--
- Andrew Gray
andrew(a)generalist.org.uk <mailto:andrew@generalist.org.uk>
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suominen(a)helsinki.fi