The suggestion to use wbeditentity was great. It took me some time to get used to using that call, but finally I managed and the optimisation was great. So great that we also finished including the Mouse genome, yesterday. It only took 2 days to complete, in contrast to the weeks with the human genome. The suggestion to use wbeditentity really made my day.
Adding the mouse genome to wikidata, did however resulted in ~1000 duplicates. [1]
The issue is that an items already existed with an identical identifier and as such resulted in unique value violations [2]
In our current approach we can't prevent this, since the gene description is currently key. We are looking into ways to use the identifier as key in contrast to the label as we do now. The simplest option would be to add the identifier as alias, but it would be ideal if we could use the same algorithm as the one generating the constrained violations, before adding a new item. Is this possible? Can a bot query for a claim P351 with a given value (e.g. 1017).
Any input would be appreciated.
Regards,
Andra