Hi Joachim,
Thanks a lot, this is extremely valuable for us!
I'm not sure I trust the Mix'n'match algorithm enough to determine that
the results are good enough - I would feel more comfortable if there was
some additional confirmation that the leftover places really do not
exist in Wikidata, for example after using alternate and/or Swedish
language labels to find additional match candidates.
Mix'n'match also apparently doesn't distinguish between entities that
were not matched because no candidates were found in Wikidata to match
against, versus entities that were not mapped because there was more
than one candidate available. At the moment we have a mix of both types
of failed matches in the Unmatched category. It would probably be fairly
safe to bulk-add the places that didn't match against anything, but I
don't know how to extract that kind of list from Mix'n'match.
My current plan is to try to take the remaining, unmapped places and try
to reconcile them using OpenRefine; if there are still no matches, then
I can go ahead and add them to Wikidata, most likely using the Quick
Statements tool which seems really convenient for this.
-Osma
Neubert, Joachim kirjoitti 28.08.2017 klo 14:45:
Hi Osma,
The instrument we used to avoid duplicates was Mix-n-match. Even when something is not
"automatically matched", often, on the details page (e.g.,
https://tools.wmflabs.org/mix-n-match/#/entry/22734337), possible matches come up.
That covers the case where a (partial) name is present somewhere in Wikidata or
Wikipedia. Unfortunatly, I've not yet figured out how I could feed my own synonyms
into Mix-n-match. Providing them in the description field helps for intellectual
identification, but seems not to be used by the matching algorithm. Possibly, a separate
"catalog" with permutated name variants from not-yet-matched entries could help,
but I'm not sure if Magnus would encourage that, because it messes up the catalog
list. Swedish and Finnish names for the same locations however could perhaps be a valid
use case.
Anyway, with the 2,200 missing RePEc authors I decided at that point that the result was
good enough, and created the not-matched entries. Less than a handful showed up later on
as duplicates at some point (e.g., as automatically matched against GND). Of course, some
will still linger hidden. But it is very easy to merge items in Wikidata, so I consider
that as a much minor problem than it would be in library systems, where it is
administrative and technically much more difficult to get rid of duplicates.
Cheers, Joachim (and sorry for the late response)
-----Ursprüngliche Nachricht-----
Von: Wikidata [mailto:wikidata-bounces@lists.wikimedia.org] Im Auftrag von
Osma Suominen
Gesendet: Montag, 21. August 2017 13:41
An: wikidata(a)lists.wikimedia.org
Betreff: Re: [Wikidata] Some Mix'n'match mappings not stored in Wikidata?
Hi Joachim,
Thanks for this, indeed this could be a potential strategy for us to add some or
all of the missing entities. The challenge is that we would need to be
reasonably sure that the places we want to create actually don't exist in
Wikidata, for example using an alternate spelling. You said in your question
that "Of course we make sure that neither of the ids exist in WD so far", but
how did you do that?
-Osma
Neubert, Joachim kirjoitti 21.08.2017 klo 12:36:
Hi Osma,
re. adding missing items, I've made good experiences with creating
input files for Quickstatements2 (see
https://github.com/zbw/repec-ras/blob/master/bin/create_missing_wikida
ta.pl). I've discussed how to best do this in the Wikidata Project
Chat before, and received valuable advice.
(
https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2017/05#S
ource_statements_for_items_syntesized_from_authorities_-_recommendatio
ns.3F)
Feel free to ask for further information, and all the best, Joachim
> -----Ursprüngliche Nachricht-----
> Von: Wikidata [mailto:wikidata-bounces@lists.wikimedia.org] Im
> Auftrag von Osma Suominen
> Gesendet: Montag, 21. August 2017 11:07
> An: Discussion list for the Wikidata project.
> Betreff: [Wikidata] Some Mix'n'match mappings not stored in Wikidata?
>
> Hi,
>
> We're more than halfway through mapping YSO places to Wikidata. Most
> of the remaining are places that don't exist in Wikidata, and adding
> them is quite labor-intensive so we will have to consider our strategy.
>
> Anyway, I did some checking of what remains unmapped and noticed a
> potential problem: some mappings for places that we have mapped using
> Mix'n'match have not actually been stored in Wikidata. For example
> Q36 Poland ("Puola" in YSO Places) is such a case. In Mix'n'match
it
> is shown as manually matched (see attached screenshot), but in
> Wikidata the corresponding YSO ID property doesn't actually exist for
> the entity. I checked the change history of the Q36 entity and
> couldn't find anything relevant there, so it seems that the mapping
> was never stored in Wikidata. Maybe there was a transient error of some
kind?
Another such case was Q1754 Stockholm ("Tukholma" in YSO places). But
for that one we removed the existing mapping in Mix'n'match and set
it again, and now it is properly stored in Wikidata.
Mix'n'match currently reports 4228 mappings for YSO places, while a
SPARQL query for the Wikidata endpoint returns 4221 such mappings. So
I suspect that this only affects a small number of entities.
Is it possible to compare the Mix'n'match mappings with what actually
exists in Wikidata, and somehow re-sync them? Or just to get the
mappings out from Mix'n'match and compare them with what exists in
Wikidata, so that the few missing mappings may be added there manually?
Thanks,
Osma
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist National Library of
Finland P.O. Box
26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suominen(a)helsinki.fi
http://www.nationallibrary.fi _______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box
26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suominen(a)helsinki.fi
http://www.nationallibrary.fi
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suominen(a)helsinki.fi