[Please pardon me if you have already read this on the Wikidata chat]
Hello folks,
------------------------------------------------------------ TL;DR: what do you think of the 3 validation criteria below? ------------------------------------------------------------
I'm excited to let you know that the soweego 2 project has just started [1]!
To cut a long story short, soweego links Wikidata to large third-party catalogs.
The next step will be all about synchronization of Wikidata to a given target catalog through a set of validation criteria. Let me paste below some key parts of the project proposal.
1) existence: whether a target identifier found in a given Wikidata item is still available in the target catalog; 2) links: to what extent all URLs available in a Wikidata item overlap with those in the corresponding target catalog entry; 3) metadata: to what extent relevant statements available in a Wikidata item overlap with those in the corresponding target catalog entry.
These criteria would respectively trigger a set of actions. As a toy example:
1) Elvis Presley (Q303) has a MusicBrainz identifier 01809552, which does not exist in MusicBrainz anymore. Action = mark the identifier statement with a deprecated rank; 2) Elvis Presley (Q303) has 7 URLs, MusicBrainz 01809552 has 8 URLs, and 3 overlap. Action = add 5 URLs from MusicBrainz to Elvis Presley (Q303) and submit 4 URLs from Wikidata to the MusicBrainz community; 3) Wikidata states that Elvis Presley (Q303) was born on January 8, 1935 in Tupelo, while MusicBrainz states that 01809552 was born in 1934 in Memphis. Action = add 2 referenced statements with MusicBrainz values to Elvis Presley (Q303) and notify 2 Wikidata values to the MusicBrainz community.
In case of either full or no overlap in criteria 2 and 3, the Wikidata identifier statement should be marked with a preferred or a deprecated rank respectively.
Please note that the soweego bot already has an approved task for criterion 2 [2], together with a set of test edits [3]. In addition, we performed (then reverted) a set of test edits for criterion 1 [4].
I'm glad to hear any thoughts about the validation criteria, keeping in mind that the more generic the better.
Stay tuned for more rock'n'roll! With love,
Marco
[1] https://meta.wikimedia.org/wiki/Grants:Project/Hjfocs/soweego_2 [2] https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Bot/Soweego_... [3] https://www.wikidata.org/w/index.php?title=Special:Contributions&target=... [4] https://www.wikidata.org/w/index.php?title=Special:Contributions&target=...