On 01.10.2015 00:58, Ricordisamoa wrote:
I think Tom is referring to external identifiers such
as MusicBrainz
artist ID <https://www.wikidata.org/wiki/Property:P434> etc. and whether
Wikidata items should show all of them or 'preferred' ones only as we
did for VIAF redirects
<https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Bot/SamoaBot_38>.
There are also other cases where external sites have duplicates that are
not reconciled (yet). For example, Q46843 has multiple GeoNames Ids:
http://sws.geonames.org/7602447
http://sws.geonames.org/2954602
The second was suggested by Freebase, the first is what Wikipedia had. I
think the first is better (polygon rather than bounding box), so I made
this preferred. This is a situation where we should keep multiple
identifiers, since the external database really has two ids that are not
integrated yet.
Now if the external site reconciles the ids, we have these options:
(1) Keep everything as is (one main id marked as "preferred")
(2) Make the redirect ids deprecated on Wikidata (show people that we
are aware of the ids but they should not be used)
(3) Delete the redirect ids
I think (2) would be cleanest, since it avoids that unaware users re-add
the old ids. (3) would also be ok once the old id is no longer in
circulation.
Is there any benefit in removing old ids completely? I guess constraint
reports will work better (but maybe constraint reports should not count
deprecated statements in single value contraints ...). Other than this,
I don't see a big reason to spend time on removing some ids. It's not
wrong to claim that these are ids, just slightly redundant, and the old
ids might still be useful for integrating with web sources that were not
updated when the redirect happened.
Markus
Il 01/10/2015 00:48, Addshore ha scritto:
On 30 September 2015 at 20:58, Tom Morris <tfmorris(a)gmail.com
<mailto:tfmorris@gmail.com>> wrote:
I think I've seen something somewhere saying that the prevailing
sentiment is that obsolete identifiers which are just redirects to
a new identifier should be removed.
I hope not. See my post at
http://addshore.com/2015/04/redirects-on-wikidata/ Redirects should
remain!
Also see
http://addshore.com/2015/09/un-deleting-500000-wikidata-items/
There's also the case of sites like MusicBrainz which keep the
non-canonical IDs without redirecting to the canonical ID, but
will tell you which ID is preferred, e.g. Fritz Kreisler
<https://www.wikidata.org/wiki/Q78517#revision=254297158>
https://musicbrainz.org/artist/590fcad4-2ba4-43bc-a22f-a4bb9b496fe8
https://musicbrainz.org/artist/627ac6c2-ee5c-4120-8af3-ab00345447f5
https://musicbrainz.org/artist/bf6d6ce1-ce88-40e6-9424-11d11d2e54ea
where all the tabs for the second two pages actually point to the
first, canonical entry.
Is there an established policy for either the redirect or
non-redirect case?
See
https://www.wikidata.org/wiki/Wikidata:Deletion_policy#Deletion_of_items_.2…
which says "Items should not be deleted when - The item redirects to
another item"
Also see
https://www.wikidata.org/wiki/Help:Merge#Create_redirect
which says redirects should be created when items are merged
I'd argue that even the obsolete identifiers are useful for
inbound resolution and reconciliation. Aggressively pruning them
just makes more work for people, because they must resolve the
identifier that they have in hand to its canonical form (probably
by hitting the issuing authority) before using it for Wikidata
lookups.
What do others think?
Tom
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata
--
Addshore
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata