[Please pardon me if you have already read this on the Wikidata chat]
Hello folks,
------------------------------------------------------------
TL;DR: what do you think of the 3 validation criteria below?
------------------------------------------------------------
I'm excited to let you know that the soweego 2 project has just started [1]!
To cut a long story short, soweego links Wikidata to large third-party
catalogs.
The next step will be all about synchronization of Wikidata to a given
target catalog through a set of validation criteria. Let me paste below
some key parts of the project proposal.
1) existence: whether a target identifier found in a given Wikidata item
is still available in the target catalog;
2) links: to what extent all URLs available in a Wikidata item overlap
with those in the corresponding target catalog entry;
3) metadata: to what extent relevant statements available in a Wikidata
item overlap with those in the corresponding target catalog entry.
These criteria would respectively trigger a set of actions. As a toy
example:
1) Elvis Presley (Q303) has a MusicBrainz identifier 01809552, which
does not exist in MusicBrainz anymore.
Action = mark the identifier statement with a deprecated rank;
2) Elvis Presley (Q303) has 7 URLs, MusicBrainz 01809552 has 8 URLs, and
3 overlap.
Action = add 5 URLs from MusicBrainz to Elvis Presley (Q303) and
submit 4 URLs from Wikidata to the MusicBrainz community;
3) Wikidata states that Elvis Presley (Q303) was born on January 8, 1935
in Tupelo, while MusicBrainz states that 01809552 was born in 1934 in
Memphis.
Action = add 2 referenced statements with MusicBrainz values to
Elvis Presley (Q303) and notify 2 Wikidata values to the MusicBrainz
community.
In case of either full or no overlap in criteria 2 and 3, the Wikidata
identifier statement should be marked with a preferred or a deprecated
rank respectively.
Please note that the soweego bot already has an approved task for
criterion 2 [2], together with a set of test edits [3]. In addition, we
performed (then reverted) a set of test edits for criterion 1 [4].
I'm glad to hear any thoughts about the validation criteria, keeping in
mind that the more generic the better.
Stay tuned for more rock'n'roll!
With love,
Marco
[1]
https://meta.wikimedia.org/wiki/Grants:Project/Hjfocs/soweego_2
[2]
https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Bot/Soweego…
[3]
https://www.wikidata.org/w/index.php?title=Special:Contributions&target…
[4]
https://www.wikidata.org/w/index.php?title=Special:Contributions&target…