I fiddled with it a bit, now 35% automatched.

Will try some more, but there are some sanity constraints on the matching. If it finds more than one match for the name, it does not set any match, because random matches on the same name were annoying in the past. There is also a type constraint, which might skip some Wikidata items without appropriate instance/subclass.

On Mon, Jun 19, 2017 at 8:09 AM Osma Suominen <osma.suominen@helsinki.fi> wrote:
Hi Magnus, all,

I've been looking a bit closer at the YSO places catalog [1] in
Mix'n'match and I'm wondering why only 20% of the places were
automatically matched.

For example, Nepal (http://www.yso.fi/onto/yso/p107682) was
automatically matched to Nepal (Q837).

But:

Accra (http://www.yso.fi/onto/yso/p138653) was not matched to Accra (Q3761).

Aceh (http://www.yso.fi/onto/yso/p147889) was not matched to Aceh (Q1823).

Akkunusjoki (http://www.yso.fi/onto/yso/p109251) was not matched to
Akkunusjoki (Q12253027).

There are many more cases like this. So the precision of the automatic
matching seems good (all but one were correct so far), but the recall is
rather low, and even in cases where the label is identical a match has
not been suggested. Is there anything that could be done about this?


Somewhat related to this, it seems that none of the places with
parenthetical qualifiers in their names were matched. For example "Ahjo
(Kerava)" could have been matched to Q11849902 (which has a Finnish
label that is identical) and "Ala-Malmi (Helsinki)" could have been
matched to Q2829441 ("Ala-Malmi"). Since almost 60% of the place names
include parenthetical qualifiers - to make them unique despite different
places having identical names - this means that a lot of potential
matches are missing. Could something be done to improve the situation?


If Mix'n'match is incapable of automatically matching cases like this,
would it help if I did an automatic matching externally using some other
tool, and then gave the potential matches as e.g. a CSV file that could
then be imported into Mix'n'match so that they can be verified there?

-Osma

[1] https://tools.wmflabs.org/mix-n-match/#/catalog/473


Osma Suominen kirjoitti 17.06.2017 klo 13:13:
> Hi Magnus,
>
> Thanks a lot, that was fast! And the results look very good!
>
> I confirmed a couple dozen automated mapping and fixed an incorrect one
> ("Amerikka" was matched to USA, but I changed it to "Americas"). Then I
> started hitting rate limit errors. I guess it would be possible to avoid
> those with some extra permissions?
>
> About 20% of the places were automatically matched. Probably most of the
> remaining ones - around 5000 - do not exist in Wikidata because they are
> e.g. towns and villages in Finland. Would it be fair game to create all
> of them in Wikidata?
>
> -Osma
>

--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suominen@helsinki.fi
http://www.nationallibrary.fi

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata