On 01.10.2015 00:58, Ricordisamoa wrote:
I think Tom is referring to external identifiers such as MusicBrainz artist ID https://www.wikidata.org/wiki/Property:P434 etc. and whether Wikidata items should show all of them or 'preferred' ones only as we did for VIAF redirects https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Bot/SamoaBot_38.
There are also other cases where external sites have duplicates that are not reconciled (yet). For example, Q46843 has multiple GeoNames Ids:
http://sws.geonames.org/7602447 http://sws.geonames.org/2954602
The second was suggested by Freebase, the first is what Wikipedia had. I think the first is better (polygon rather than bounding box), so I made this preferred. This is a situation where we should keep multiple identifiers, since the external database really has two ids that are not integrated yet.
Now if the external site reconciles the ids, we have these options: (1) Keep everything as is (one main id marked as "preferred") (2) Make the redirect ids deprecated on Wikidata (show people that we are aware of the ids but they should not be used) (3) Delete the redirect ids
I think (2) would be cleanest, since it avoids that unaware users re-add the old ids. (3) would also be ok once the old id is no longer in circulation.
Is there any benefit in removing old ids completely? I guess constraint reports will work better (but maybe constraint reports should not count deprecated statements in single value contraints ...). Other than this, I don't see a big reason to spend time on removing some ids. It's not wrong to claim that these are ids, just slightly redundant, and the old ids might still be useful for integrating with web sources that were not updated when the redirect happened.
Markus
Il 01/10/2015 00:48, Addshore ha scritto:
On 30 September 2015 at 20:58, Tom Morris <tfmorris@gmail.com mailto:tfmorris@gmail.com> wrote:
I think I've seen something somewhere saying that the prevailing sentiment is that obsolete identifiers which are just redirects to a new identifier should be removed.
I hope not. See my post at http://addshore.com/2015/04/redirects-on-wikidata/ Redirects should remain!
Also see http://addshore.com/2015/09/un-deleting-500000-wikidata-items/
There's also the case of sites like MusicBrainz which keep the non-canonical IDs without redirecting to the canonical ID, but will tell you which ID is preferred, e.g. Fritz Kreisler <https://www.wikidata.org/wiki/Q78517#revision=254297158> https://musicbrainz.org/artist/590fcad4-2ba4-43bc-a22f-a4bb9b496fe8 https://musicbrainz.org/artist/627ac6c2-ee5c-4120-8af3-ab00345447f5 https://musicbrainz.org/artist/bf6d6ce1-ce88-40e6-9424-11d11d2e54ea where all the tabs for the second two pages actually point to the first, canonical entry. Is there an established policy for either the redirect or non-redirect case?
See https://www.wikidata.org/wiki/Wikidata:Deletion_policy#Deletion_of_items_.28... which says "Items should not be deleted when - The item redirects to another item"
Also see https://www.wikidata.org/wiki/Help:Merge#Create_redirect which says redirects should be created when items are merged
I'd argue that even the obsolete identifiers are useful for inbound resolution and reconciliation. Aggressively pruning them just makes more work for people, because they must resolve the identifier that they have in hand to its canonical form (probably by hitting the issuing authority) before using it for Wikidata lookups. What do others think? Tom _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata
-- Addshore
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata