I fiddled with it a bit, now 35% automatched.
Will try some more, but there are some sanity constraints on the matching. If it finds more than one match for the name, it does not set any match, because random matches on the same name were annoying in the past. There is also a type constraint, which might skip some Wikidata items without appropriate instance/subclass.
On Mon, Jun 19, 2017 at 8:09 AM Osma Suominen osma.suominen@helsinki.fi wrote:
Hi Magnus, all,
I've been looking a bit closer at the YSO places catalog [1] in Mix'n'match and I'm wondering why only 20% of the places were automatically matched.
For example, Nepal (http://www.yso.fi/onto/yso/p107682) was automatically matched to Nepal (Q837).
But:
Accra (http://www.yso.fi/onto/yso/p138653) was not matched to Accra (Q3761).
Aceh (http://www.yso.fi/onto/yso/p147889) was not matched to Aceh (Q1823).
Akkunusjoki (http://www.yso.fi/onto/yso/p109251) was not matched to Akkunusjoki (Q12253027).
There are many more cases like this. So the precision of the automatic matching seems good (all but one were correct so far), but the recall is rather low, and even in cases where the label is identical a match has not been suggested. Is there anything that could be done about this?
Somewhat related to this, it seems that none of the places with parenthetical qualifiers in their names were matched. For example "Ahjo (Kerava)" could have been matched to Q11849902 (which has a Finnish label that is identical) and "Ala-Malmi (Helsinki)" could have been matched to Q2829441 ("Ala-Malmi"). Since almost 60% of the place names include parenthetical qualifiers - to make them unique despite different places having identical names - this means that a lot of potential matches are missing. Could something be done to improve the situation?
If Mix'n'match is incapable of automatically matching cases like this, would it help if I did an automatic matching externally using some other tool, and then gave the potential matches as e.g. a CSV file that could then be imported into Mix'n'match so that they can be verified there?
-Osma
[1] https://tools.wmflabs.org/mix-n-match/#/catalog/473
Osma Suominen kirjoitti 17.06.2017 klo 13:13:
Hi Magnus,
Thanks a lot, that was fast! And the results look very good!
I confirmed a couple dozen automated mapping and fixed an incorrect one ("Amerikka" was matched to USA, but I changed it to "Americas"). Then I started hitting rate limit errors. I guess it would be possible to avoid those with some extra permissions?
About 20% of the places were automatically matched. Probably most of the remaining ones - around 5000 - do not exist in Wikidata because they are e.g. towns and villages in Finland. Would it be fair game to create all of them in Wikidata?
-Osma
-- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box 26 (Kaikukatu 4) 00014 HELSINGIN YLIOPISTO Tel. +358 50 3199529 <+358%2050%203199529> osma.suominen@helsinki.fi http://www.nationallibrary.fi
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata