I think I've seen something somewhere saying that the prevailing sentiment is that obsolete identifiers which are just redirects to a new identifier should be removed.
There's also the case of sites like MusicBrainz which keep the non-canonical IDs without redirecting to the canonical ID, but will tell you which ID is preferred, e.g. Fritz Kreisler https://www.wikidata.org/wiki/Q78517#revision=254297158
https://musicbrainz.org/artist/590fcad4-2ba4-43bc-a22f-a4bb9b496fe8 https://musicbrainz.org/artist/627ac6c2-ee5c-4120-8af3-ab00345447f5 https://musicbrainz.org/artist/bf6d6ce1-ce88-40e6-9424-11d11d2e54ea
where all the tabs for the second two pages actually point to the first, canonical entry.
Is there an established policy for either the redirect or non-redirect case?
I'd argue that even the obsolete identifiers are useful for inbound resolution and reconciliation. Aggressively pruning them just makes more work for people, because they must resolve the identifier that they have in hand to its canonical form (probably by hitting the issuing authority) before using it for Wikidata lookups.
What do others think?
Tom
On 30 September 2015 at 20:58, Tom Morris tfmorris@gmail.com wrote:
I think I've seen something somewhere saying that the prevailing sentiment is that obsolete identifiers which are just redirects to a new identifier should be removed.
I hope not. See my post at http://addshore.com/2015/04/redirects-on-wikidata/ Redirects should remain!
Also see http://addshore.com/2015/09/un-deleting-500000-wikidata-items/
There's also the case of sites like MusicBrainz which keep the non-canonical IDs without redirecting to the canonical ID, but will tell you which ID is preferred, e.g. Fritz Kreisler https://www.wikidata.org/wiki/Q78517#revision=254297158
https://musicbrainz.org/artist/590fcad4-2ba4-43bc-a22f-a4bb9b496fe8 https://musicbrainz.org/artist/627ac6c2-ee5c-4120-8af3-ab00345447f5 https://musicbrainz.org/artist/bf6d6ce1-ce88-40e6-9424-11d11d2e54ea
where all the tabs for the second two pages actually point to the first, canonical entry.
Is there an established policy for either the redirect or non-redirect case?
See https://www.wikidata.org/wiki/Wikidata:Deletion_policy#Deletion_of_items_.28... which says "Items should not be deleted when - The item redirects to another item"
Also see https://www.wikidata.org/wiki/Help:Merge#Create_redirect which says redirects should be created when items are merged
I'd argue that even the obsolete identifiers are useful for inbound resolution and reconciliation. Aggressively pruning them just makes more work for people, because they must resolve the identifier that they have in hand to its canonical form (probably by hitting the issuing authority) before using it for Wikidata lookups.
What do others think?
Tom
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Sorry, I should have been clearer that I was talking about *external* identifiers, not Wikidata identifiers/items.
For example, the three MusicBrainz IDs listed here:
https://www.wikidata.org/wiki/Q78517#revision=254297158
Is that allowed? Encouraged? I'm hoping that the answer is yes to both.
Tom
On Wed, Sep 30, 2015 at 6:48 PM, Addshore addshorewiki@gmail.com wrote:
On 30 September 2015 at 20:58, Tom Morris tfmorris@gmail.com wrote:
I think I've seen something somewhere saying that the prevailing sentiment is that obsolete identifiers which are just redirects to a new identifier should be removed.
I hope not. See my post at http://addshore.com/2015/04/redirects-on-wikidata/ Redirects should remain!
Also see http://addshore.com/2015/09/un-deleting-500000-wikidata-items/
There's also the case of sites like MusicBrainz which keep the non-canonical IDs without redirecting to the canonical ID, but will tell you which ID is preferred, e.g. Fritz Kreisler https://www.wikidata.org/wiki/Q78517#revision=254297158
https://musicbrainz.org/artist/590fcad4-2ba4-43bc-a22f-a4bb9b496fe8 https://musicbrainz.org/artist/627ac6c2-ee5c-4120-8af3-ab00345447f5 https://musicbrainz.org/artist/bf6d6ce1-ce88-40e6-9424-11d11d2e54ea
where all the tabs for the second two pages actually point to the first, canonical entry.
Is there an established policy for either the redirect or non-redirect case?
See https://www.wikidata.org/wiki/Wikidata:Deletion_policy#Deletion_of_items_.28... which says "Items should not be deleted when - The item redirects to another item"
Also see https://www.wikidata.org/wiki/Help:Merge#Create_redirect which says redirects should be created when items are merged
I'd argue that even the obsolete identifiers are useful for inbound resolution and reconciliation. Aggressively pruning them just makes more work for people, because they must resolve the identifier that they have in hand to its canonical form (probably by hitting the issuing authority) before using it for Wikidata lookups.
What do others think?
Tom
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
-- Addshore
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
I think Tom is referring to external identifiers such as MusicBrainz artist ID https://www.wikidata.org/wiki/Property:P434 etc. and whether Wikidata items should show all of them or 'preferred' ones only as we did for VIAF redirects https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Bot/SamoaBot_38.
Il 01/10/2015 00:48, Addshore ha scritto:
On 30 September 2015 at 20:58, Tom Morris <tfmorris@gmail.com mailto:tfmorris@gmail.com> wrote:
I think I've seen something somewhere saying that the prevailing sentiment is that obsolete identifiers which are just redirects to a new identifier should be removed.
I hope not. See my post at http://addshore.com/2015/04/redirects-on-wikidata/ Redirects should remain!
Also see http://addshore.com/2015/09/un-deleting-500000-wikidata-items/
There's also the case of sites like MusicBrainz which keep the non-canonical IDs without redirecting to the canonical ID, but will tell you which ID is preferred, e.g. Fritz Kreisler <https://www.wikidata.org/wiki/Q78517#revision=254297158> https://musicbrainz.org/artist/590fcad4-2ba4-43bc-a22f-a4bb9b496fe8 https://musicbrainz.org/artist/627ac6c2-ee5c-4120-8af3-ab00345447f5 https://musicbrainz.org/artist/bf6d6ce1-ce88-40e6-9424-11d11d2e54ea where all the tabs for the second two pages actually point to the first, canonical entry. Is there an established policy for either the redirect or non-redirect case?
See https://www.wikidata.org/wiki/Wikidata:Deletion_policy#Deletion_of_items_.28... which says "Items should not be deleted when - The item redirects to another item"
Also see https://www.wikidata.org/wiki/Help:Merge#Create_redirect which says redirects should be created when items are merged
I'd argue that even the obsolete identifiers are useful for inbound resolution and reconciliation. Aggressively pruning them just makes more work for people, because they must resolve the identifier that they have in hand to its canonical form (probably by hitting the issuing authority) before using it for Wikidata lookups. What do others think? Tom _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata
-- Addshore
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On 01.10.2015 00:58, Ricordisamoa wrote:
I think Tom is referring to external identifiers such as MusicBrainz artist ID https://www.wikidata.org/wiki/Property:P434 etc. and whether Wikidata items should show all of them or 'preferred' ones only as we did for VIAF redirects https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Bot/SamoaBot_38.
There are also other cases where external sites have duplicates that are not reconciled (yet). For example, Q46843 has multiple GeoNames Ids:
http://sws.geonames.org/7602447 http://sws.geonames.org/2954602
The second was suggested by Freebase, the first is what Wikipedia had. I think the first is better (polygon rather than bounding box), so I made this preferred. This is a situation where we should keep multiple identifiers, since the external database really has two ids that are not integrated yet.
Now if the external site reconciles the ids, we have these options: (1) Keep everything as is (one main id marked as "preferred") (2) Make the redirect ids deprecated on Wikidata (show people that we are aware of the ids but they should not be used) (3) Delete the redirect ids
I think (2) would be cleanest, since it avoids that unaware users re-add the old ids. (3) would also be ok once the old id is no longer in circulation.
Is there any benefit in removing old ids completely? I guess constraint reports will work better (but maybe constraint reports should not count deprecated statements in single value contraints ...). Other than this, I don't see a big reason to spend time on removing some ids. It's not wrong to claim that these are ids, just slightly redundant, and the old ids might still be useful for integrating with web sources that were not updated when the redirect happened.
Markus
Il 01/10/2015 00:48, Addshore ha scritto:
On 30 September 2015 at 20:58, Tom Morris <tfmorris@gmail.com mailto:tfmorris@gmail.com> wrote:
I think I've seen something somewhere saying that the prevailing sentiment is that obsolete identifiers which are just redirects to a new identifier should be removed.
I hope not. See my post at http://addshore.com/2015/04/redirects-on-wikidata/ Redirects should remain!
Also see http://addshore.com/2015/09/un-deleting-500000-wikidata-items/
There's also the case of sites like MusicBrainz which keep the non-canonical IDs without redirecting to the canonical ID, but will tell you which ID is preferred, e.g. Fritz Kreisler <https://www.wikidata.org/wiki/Q78517#revision=254297158> https://musicbrainz.org/artist/590fcad4-2ba4-43bc-a22f-a4bb9b496fe8 https://musicbrainz.org/artist/627ac6c2-ee5c-4120-8af3-ab00345447f5 https://musicbrainz.org/artist/bf6d6ce1-ce88-40e6-9424-11d11d2e54ea where all the tabs for the second two pages actually point to the first, canonical entry. Is there an established policy for either the redirect or non-redirect case?
See https://www.wikidata.org/wiki/Wikidata:Deletion_policy#Deletion_of_items_.28... which says "Items should not be deleted when - The item redirects to another item"
Also see https://www.wikidata.org/wiki/Help:Merge#Create_redirect which says redirects should be created when items are merged
I'd argue that even the obsolete identifiers are useful for inbound resolution and reconciliation. Aggressively pruning them just makes more work for people, because they must resolve the identifier that they have in hand to its canonical form (probably by hitting the issuing authority) before using it for Wikidata lookups. What do others think? Tom _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata
-- Addshore
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
No benefit to removing the old ids...in fact...It would make things more difficult for me and others in a few older databases. I would like to keep the old IDs in Wikidata around for posterity and provenance ...some of us still have really old databases with cruft and old IDs from years and years ago, some from the start of the Internet :) If you remove the old IDs it will make it that much harder for me to reconcile some of them.
+1 Being able to query the Wikidata API with an older ID and it showing me that it is an old ID and letting me know there is now a preferred ID, would be fantastic.
Thad +ThadGuidry https://www.google.com/+ThadGuidry
On Thu, Oct 1, 2015 at 3:19 AM, Markus Krötzsch < markus@semantic-mediawiki.org> wrote:
On 01.10.2015 00:58, Ricordisamoa wrote:
I think Tom is referring to external identifiers such as MusicBrainz artist ID https://www.wikidata.org/wiki/Property:P434 etc. and whether Wikidata items should show all of them or 'preferred' ones only as we did for VIAF redirects < https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Bot/SamoaBot...
.
There are also other cases where external sites have duplicates that are not reconciled (yet). For example, Q46843 has multiple GeoNames Ids:
http://sws.geonames.org/7602447 http://sws.geonames.org/2954602
The second was suggested by Freebase, the first is what Wikipedia had. I think the first is better (polygon rather than bounding box), so I made this preferred. This is a situation where we should keep multiple identifiers, since the external database really has two ids that are not integrated yet.
Now if the external site reconciles the ids, we have these options: (1) Keep everything as is (one main id marked as "preferred") (2) Make the redirect ids deprecated on Wikidata (show people that we are aware of the ids but they should not be used) (3) Delete the redirect ids
I think (2) would be cleanest, since it avoids that unaware users re-add the old ids. (3) would also be ok once the old id is no longer in circulation.
Is there any benefit in removing old ids completely? I guess constraint reports will work better (but maybe constraint reports should not count deprecated statements in single value contraints ...). Other than this, I don't see a big reason to spend time on removing some ids. It's not wrong to claim that these are ids, just slightly redundant, and the old ids might still be useful for integrating with web sources that were not updated when the redirect happened.
Markus
On Thu, Oct 1, 2015 at 4:19 AM, Markus Krötzsch < markus@semantic-mediawiki.org> wrote:
On 01.10.2015 00:58, Ricordisamoa wrote:
I think Tom is referring to external identifiers such as MusicBrainz artist ID https://www.wikidata.org/wiki/Property:P434 etc. and whether Wikidata items should show all of them or 'preferred' ones only as we did for VIAF redirects < https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Bot/SamoaBot...
.
Now if the external site reconciles the ids, we have these options: (1) Keep everything as is (one main id marked as "preferred") (2) Make the redirect ids deprecated on Wikidata (show people that we are aware of the ids but they should not be used) (3) Delete the redirect ids
I think (2) would be cleanest, since it avoids that unaware users re-add the old ids. (3) would also be ok once the old id is no longer in circulation.
I agree #2 is best, although #1 could work too. The problem with #3 is that an identifier, once minted, is never "no longer in circulation." This is precisely why Wikidata items are never deleted. There's always the possibility that someone will hold a reference to it somewhere. Thad's use case isn't uncommon.
Is there any benefit in removing old ids completely? I guess constraint
reports will work better (but maybe constraint reports should not count deprecated statements in single value contraints ...).
The constraint reports definitely need to be fixed. I recently saw a reference to a VIAF bot run that deleted a whole bunch of VIAF identifiers to "fix" things being flagged by some constraint.
Other than this, I don't see a big reason to spend time on removing some ids. It's not wrong to claim that these are ids, just slightly redundant, and the old ids might still be useful for integrating with web sources that were not updated when the redirect happened.
Rather than not wasting time removing, I'd like to see affirmative statements that keeping them is a good thing. If people find them annoying or cluttering, it's because of poor UI design, not because they lack usefulness.
Tom
It might be worth creating a qualifier "reason for deprecation" to indicate in more detail why a particular value is deprecated (eg "superseded", "redirected on target website", etc).
-- James.
On 01/10/2015 17:40, Tom Morris wrote:
On Thu, Oct 1, 2015 at 4:19 AM, Markus Krötzsch < markus@semantic-mediawiki.org> wrote:
On 01.10.2015 00:58, Ricordisamoa wrote:
I think Tom is referring to external identifiers such as MusicBrainz artist ID https://www.wikidata.org/wiki/Property:P434 etc. and whether Wikidata items should show all of them or 'preferred' ones only as we did for VIAF redirects < https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Bot/SamoaBot...
.
Now if the external site reconciles the ids, we have these options: (1) Keep everything as is (one main id marked as "preferred") (2) Make the redirect ids deprecated on Wikidata (show people that we are aware of the ids but they should not be used) (3) Delete the redirect ids
I think (2) would be cleanest, since it avoids that unaware users re-add the old ids. (3) would also be ok once the old id is no longer in circulation.
I agree #2 is best, although #1 could work too. The problem with #3 is that an identifier, once minted, is never "no longer in circulation." This is precisely why Wikidata items are never deleted. There's always the possibility that someone will hold a reference to it somewhere. Thad's use case isn't uncommon.
Is there any benefit in removing old ids completely? I guess constraint
reports will work better (but maybe constraint reports should not count deprecated statements in single value contraints ...).
The constraint reports definitely need to be fixed. I recently saw a reference to a VIAF bot run that deleted a whole bunch of VIAF identifiers to "fix" things being flagged by some constraint.
Other than this, I don't see a big reason to spend time on removing some ids. It's not wrong to claim that these are ids, just slightly redundant, and the old ids might still be useful for integrating with web sources that were not updated when the redirect happened.
Rather than not wasting time removing, I'd like to see affirmative statements that keeping them is a good thing. If people find them annoying or cluttering, it's because of poor UI design, not because they lack usefulness.
Tom
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
I agree with Tom on this. I would prefer to keep all of the redirects nd just deprecate them (especially for names of people, because hidden in the redirect is an alternate spelling that should be added as an alias to the label field)
On Thu, Oct 1, 2015 at 6:55 PM, James Heald j.heald@ucl.ac.uk wrote:
It might be worth creating a qualifier "reason for deprecation" to indicate in more detail why a particular value is deprecated (eg "superseded", "redirected on target website", etc).
-- James.
On 01/10/2015 17:40, Tom Morris wrote:
On Thu, Oct 1, 2015 at 4:19 AM, Markus Krötzsch < markus@semantic-mediawiki.org> wrote:
On 01.10.2015 00:58, Ricordisamoa wrote:
I think Tom is referring to external identifiers such as MusicBrainz
artist ID https://www.wikidata.org/wiki/Property:P434 etc. and whether Wikidata items should show all of them or 'preferred' ones only as we did for VIAF redirects <
https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Bot/SamoaBot...
.
Now if the external site reconciles the ids, we have these options: (1) Keep everything as is (one main id marked as "preferred") (2) Make the redirect ids deprecated on Wikidata (show people that we are aware of the ids but they should not be used) (3) Delete the redirect ids
I think (2) would be cleanest, since it avoids that unaware users re-add the old ids. (3) would also be ok once the old id is no longer in circulation.
I agree #2 is best, although #1 could work too. The problem with #3 is that an identifier, once minted, is never "no longer in circulation." This is precisely why Wikidata items are never deleted. There's always the possibility that someone will hold a reference to it somewhere. Thad's use case isn't uncommon.
Is there any benefit in removing old ids completely? I guess constraint
reports will work better (but maybe constraint reports should not count deprecated statements in single value contraints ...).
The constraint reports definitely need to be fixed. I recently saw a reference to a VIAF bot run that deleted a whole bunch of VIAF identifiers to "fix" things being flagged by some constraint.
Other than this, I don't see a big reason to spend time on removing some
ids. It's not wrong to claim that these are ids, just slightly redundant, and the old ids might still be useful for integrating with web sources that were not updated when the redirect happened.
Rather than not wasting time removing, I'd like to see affirmative statements that keeping them is a good thing. If people find them annoying or cluttering, it's because of poor UI design, not because they lack usefulness.
Tom
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
A small mechanical note for those not familiar with Wikidata's internals (since it took me a while to figure this out):
"Preferred" and "Deprecated" are "Ranks" (the third is "Normal") and the rank can be set by clicking the "Edit" button and then clicking on the leftmost of the two tiny sets of three stacked buttons near the left of the input field.
It seems like the constraint checker could check for either only one "Preferred" or all but one "Deprecated" which would allow editors to evolve in whichever way they wanted.
Tom
On Thu, Oct 1, 2015 at 12:40 PM, Tom Morris tfmorris@gmail.com wrote:
On Thu, Oct 1, 2015 at 4:19 AM, Markus Krötzsch < markus@semantic-mediawiki.org> wrote:
On 01.10.2015 00:58, Ricordisamoa wrote:
I think Tom is referring to external identifiers such as MusicBrainz artist ID https://www.wikidata.org/wiki/Property:P434 etc. and whether Wikidata items should show all of them or 'preferred' ones only as we did for VIAF redirects < https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Bot/SamoaBot...
.
Now if the external site reconciles the ids, we have these options: (1) Keep everything as is (one main id marked as "preferred") (2) Make the redirect ids deprecated on Wikidata (show people that we are aware of the ids but they should not be used) (3) Delete the redirect ids
I think (2) would be cleanest, since it avoids that unaware users re-add the old ids. (3) would also be ok once the old id is no longer in circulation.
I agree #2 is best, although #1 could work too. The problem with #3 is that an identifier, once minted, is never "no longer in circulation." This is precisely why Wikidata items are never deleted. There's always the possibility that someone will hold a reference to it somewhere. Thad's use case isn't uncommon.
Is there any benefit in removing old ids completely? I guess constraint
reports will work better (but maybe constraint reports should not count deprecated statements in single value contraints ...).
The constraint reports definitely need to be fixed. I recently saw a reference to a VIAF bot run that deleted a whole bunch of VIAF identifiers to "fix" things being flagged by some constraint.
Other than this, I don't see a big reason to spend time on removing some ids. It's not wrong to claim that these are ids, just slightly redundant, and the old ids might still be useful for integrating with web sources that were not updated when the redirect happened.
Rather than not wasting time removing, I'd like to see affirmative statements that keeping them is a good thing. If people find them annoying or cluttering, it's because of poor UI design, not because they lack usefulness.
Tom
Hi!
It seems like the constraint checker could check for either only one "Preferred" or all but one "Deprecated" which would allow editors to evolve in whichever way they wanted.
It should probably consider "best rank" ones - i.e. if Preferred exists then Preferred ones, otherwise Normal ones but never Deprecated ones.
Am 01.10.2015 um 18:40 schrieb Tom Morris:
On Thu, Oct 1, 2015 at 4:19 AM, Markus Krötzsch <markus@semantic-mediawiki.org mailto:markus@semantic-mediawiki.org> wrote:
On 01.10.2015 00:58, Ricordisamoa wrote: I think Tom is referring to external identifiers such as MusicBrainz artist ID <https://www.wikidata.org/wiki/Property:P434> etc. and whether Wikidata items should show all of them or 'preferred' ones only as we did for VIAF redirects <https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Bot/SamoaBot_38>. Now if the external site reconciles the ids, we have these options: (1) Keep everything as is (one main id marked as "preferred") (2) Make the redirect ids deprecated on Wikidata (show people that we are aware of the ids but they should not be used) (3) Delete the redirect ids I think (2) would be cleanest, since it avoids that unaware users re-add the old ids. (3) would also be ok once the old id is no longer in circulation.
I agree #2 is best, although #1 could work too. The problem with #3 is that an identifier, once minted, is never "no longer in circulation." This is precisely why Wikidata items are never deleted. There's always the possibility that someone will hold a reference to it somewhere. Thad's use case isn't uncommon.
I think #2 is the only solution we should do. This is exactly what the deprecated rank is for: marking some information as valid for some point in time but telling the users that this should not be used any more.
Is there any benefit in removing old ids completely? I guess constraint reports will work better (but maybe constraint reports should not count deprecated statements in single value contraints ...).
The constraint reports definitely need to be fixed. I recently saw a reference to a VIAF bot run that deleted a whole bunch of VIAF identifiers to "fix" things being flagged by some constraint.
Not sure if marking the statements as deprecated would already fix them. If not, the code creating these lists needs to be adjusted to ignore deprecated statements (maybe optionally?).
Other than this, I don't see a big reason to spend time on removing some ids. It's not wrong to claim that these are ids, just slightly redundant, and the old ids might still be useful for integrating with web sources that were not updated when the redirect happened.
Rather than not wasting time removing, I'd like to see affirmative statements that keeping them is a good thing. If people find them annoying or cluttering, it's because of poor UI design, not because they lack usefulness.
Indeed and as far as I know the new ui will hide deprecated statements per default and only show them on demand by toggling.
Best regards Bene