Hi Joachim,
Thanks a lot, this is extremely valuable for us!
I'm not sure I trust the Mix'n'match algorithm enough to determine that the results are good enough - I would feel more comfortable if there was some additional confirmation that the leftover places really do not exist in Wikidata, for example after using alternate and/or Swedish language labels to find additional match candidates.
Mix'n'match also apparently doesn't distinguish between entities that were not matched because no candidates were found in Wikidata to match against, versus entities that were not mapped because there was more than one candidate available. At the moment we have a mix of both types of failed matches in the Unmatched category. It would probably be fairly safe to bulk-add the places that didn't match against anything, but I don't know how to extract that kind of list from Mix'n'match.
My current plan is to try to take the remaining, unmapped places and try to reconcile them using OpenRefine; if there are still no matches, then I can go ahead and add them to Wikidata, most likely using the Quick Statements tool which seems really convenient for this.
-Osma
Neubert, Joachim kirjoitti 28.08.2017 klo 14:45:
Hi Osma,
The instrument we used to avoid duplicates was Mix-n-match. Even when something is not "automatically matched", often, on the details page (e.g., https://tools.wmflabs.org/mix-n-match/#/entry/22734337), possible matches come up.
That covers the case where a (partial) name is present somewhere in Wikidata or Wikipedia. Unfortunatly, I've not yet figured out how I could feed my own synonyms into Mix-n-match. Providing them in the description field helps for intellectual identification, but seems not to be used by the matching algorithm. Possibly, a separate "catalog" with permutated name variants from not-yet-matched entries could help, but I'm not sure if Magnus would encourage that, because it messes up the catalog list. Swedish and Finnish names for the same locations however could perhaps be a valid use case.
Anyway, with the 2,200 missing RePEc authors I decided at that point that the result was good enough, and created the not-matched entries. Less than a handful showed up later on as duplicates at some point (e.g., as automatically matched against GND). Of course, some will still linger hidden. But it is very easy to merge items in Wikidata, so I consider that as a much minor problem than it would be in library systems, where it is administrative and technically much more difficult to get rid of duplicates.
Cheers, Joachim (and sorry for the late response)
-----Ursprüngliche Nachricht----- Von: Wikidata [mailto:wikidata-bounces@lists.wikimedia.org] Im Auftrag von Osma Suominen Gesendet: Montag, 21. August 2017 13:41 An: wikidata@lists.wikimedia.org Betreff: Re: [Wikidata] Some Mix'n'match mappings not stored in Wikidata?
Hi Joachim,
Thanks for this, indeed this could be a potential strategy for us to add some or all of the missing entities. The challenge is that we would need to be reasonably sure that the places we want to create actually don't exist in Wikidata, for example using an alternate spelling. You said in your question that "Of course we make sure that neither of the ids exist in WD so far", but how did you do that?
-Osma
Neubert, Joachim kirjoitti 21.08.2017 klo 12:36:
Hi Osma,
re. adding missing items, I've made good experiences with creating input files for Quickstatements2 (see https://github.com/zbw/repec-ras/blob/master/bin/create_missing_wikida ta.pl). I've discussed how to best do this in the Wikidata Project Chat before, and received valuable advice. (https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2017/05#S ource_statements_for_items_syntesized_from_authorities_-_recommendatio ns.3F)
Feel free to ask for further information, and all the best, Joachim
-----Ursprüngliche Nachricht----- Von: Wikidata [mailto:wikidata-bounces@lists.wikimedia.org] Im Auftrag von Osma Suominen Gesendet: Montag, 21. August 2017 11:07 An: Discussion list for the Wikidata project. Betreff: [Wikidata] Some Mix'n'match mappings not stored in Wikidata?
Hi,
We're more than halfway through mapping YSO places to Wikidata. Most of the remaining are places that don't exist in Wikidata, and adding them is quite labor-intensive so we will have to consider our strategy.
Anyway, I did some checking of what remains unmapped and noticed a potential problem: some mappings for places that we have mapped using Mix'n'match have not actually been stored in Wikidata. For example Q36 Poland ("Puola" in YSO Places) is such a case. In Mix'n'match it is shown as manually matched (see attached screenshot), but in Wikidata the corresponding YSO ID property doesn't actually exist for the entity. I checked the change history of the Q36 entity and couldn't find anything relevant there, so it seems that the mapping was never stored in Wikidata. Maybe there was a transient error of some
kind?
Another such case was Q1754 Stockholm ("Tukholma" in YSO places). But for that one we removed the existing mapping in Mix'n'match and set it again, and now it is properly stored in Wikidata.
Mix'n'match currently reports 4228 mappings for YSO places, while a SPARQL query for the Wikidata endpoint returns 4221 such mappings. So I suspect that this only affects a small number of entities.
Is it possible to compare the Mix'n'match mappings with what actually exists in Wikidata, and somehow re-sync them? Or just to get the mappings out from Mix'n'match and compare them with what exists in Wikidata, so that the few missing mappings may be added there manually?
Thanks, Osma
-- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box 26 (Kaikukatu 4) 00014 HELSINGIN YLIOPISTO Tel. +358 50 3199529 osma.suominen@helsinki.fi http://www.nationallibrary.fi
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
-- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box 26 (Kaikukatu 4) 00014 HELSINGIN YLIOPISTO Tel. +358 50 3199529 osma.suominen@helsinki.fi http://www.nationallibrary.fi
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata