No benefit to removing the old ids...in fact...It would make things more difficult for me and others in a few older databases.  I would like to keep the old IDs in Wikidata around for posterity and provenance ...some of us still have really old databases with cruft and old IDs from years and years ago, some from the start of the Internet :)  If you remove the old IDs it will make it that much harder for me to reconcile some of them.

+1  Being able to query the Wikidata API with an older ID and it showing me that it is an old ID and letting me know there is now a preferred ID, would be fantastic.


On Thu, Oct 1, 2015 at 3:19 AM, Markus Krötzsch <markus@semantic-mediawiki.org> wrote:
On 01.10.2015 00:58, Ricordisamoa wrote:
I think Tom is referring to external identifiers such as MusicBrainz
artist ID <https://www.wikidata.org/wiki/Property:P434> etc. and whether
Wikidata items should show all of them or 'preferred' ones only as we
did for VIAF redirects
<https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Bot/SamoaBot_38>.

There are also other cases where external sites have duplicates that are not reconciled (yet). For example, Q46843 has multiple GeoNames Ids:

http://sws.geonames.org/7602447
http://sws.geonames.org/2954602

The second was suggested by Freebase, the first is what Wikipedia had. I think the first is better (polygon rather than bounding box), so I made this preferred. This is a situation where we should keep multiple identifiers, since the external database really has two ids that are not integrated yet.

Now if the external site reconciles the ids, we have these options:
(1) Keep everything as is (one main id marked as "preferred")
(2) Make the redirect ids deprecated on Wikidata (show people that we are aware of the ids but they should not be used)
(3) Delete the redirect ids

I think (2) would be cleanest, since it avoids that unaware users re-add the old ids. (3) would also be ok once the old id is no longer in circulation.

Is there any benefit in removing old ids completely? I guess constraint reports will work better (but maybe constraint reports should not count deprecated statements in single value contraints ...). Other than this, I don't see a big reason to spend time on removing some ids. It's not wrong to claim that these are ids, just slightly redundant, and the old ids might still be useful for integrating with web sources that were not updated when the redirect happened.

Markus