For "casual matching", try the game mode:
https://tools.wmflabs.org/mix-n-match/#/random/473

On Mon, Jun 19, 2017 at 10:16 AM Osma Suominen <osma.suominen@helsinki.fi> wrote:
Hi Magnus!

It's even higher now - 45%. Thanks a lot! This helps a lot with the
verifying.

Also matching of names with parenthetical qualifiers works better now. I
see that "Ala-Malmi (Helsinki)" was automatched to "Ala-Malmi". However,
"Ahjo (Kerava)" was not matched to "Ahjo (Kerava)" (Q11849902) but to
Q1368573 (which is "Ahjo" in Finnish but means a type of metalworking
workshop, not a specific place). Neither Wikidata entity has a type
statement, the latter has "subclass-of <workshop>" statement.

In any case, I think this is now good enough for serious work, so we
will start verifying the suggested matches. 2.5% (173) already done...

-Osma


Magnus Manske kirjoitti 19.06.2017 klo 12:02:
> I fiddled with it a bit, now 35% automatched.
>
> Will try some more, but there are some sanity constraints on the
> matching. If it finds more than one match for the name, it does not set
> any match, because random matches on the same name were annoying in the
> past. There is also a type constraint, which might skip some Wikidata
> items without appropriate instance/subclass.
>
> On Mon, Jun 19, 2017 at 8:09 AM Osma Suominen <osma.suominen@helsinki.fi
> <mailto:osma.suominen@helsinki.fi>> wrote:
>
>     Hi Magnus, all,
>
>     I've been looking a bit closer at the YSO places catalog [1] in
>     Mix'n'match and I'm wondering why only 20% of the places were
>     automatically matched.
>
>     For example, Nepal (http://www.yso.fi/onto/yso/p107682) was
>     automatically matched to Nepal (Q837).
>
>     But:
>
>     Accra (http://www.yso.fi/onto/yso/p138653) was not matched to Accra
>     (Q3761).
>
>     Aceh (http://www.yso.fi/onto/yso/p147889) was not matched to Aceh
>     (Q1823).
>
>     Akkunusjoki (http://www.yso.fi/onto/yso/p109251) was not matched to
>     Akkunusjoki (Q12253027).
>
>     There are many more cases like this. So the precision of the automatic
>     matching seems good (all but one were correct so far), but the recall is
>     rather low, and even in cases where the label is identical a match has
>     not been suggested. Is there anything that could be done about this?
>
>
>     Somewhat related to this, it seems that none of the places with
>     parenthetical qualifiers in their names were matched. For example "Ahjo
>     (Kerava)" could have been matched to Q11849902 (which has a Finnish
>     label that is identical) and "Ala-Malmi (Helsinki)" could have been
>     matched to Q2829441 ("Ala-Malmi"). Since almost 60% of the place names
>     include parenthetical qualifiers - to make them unique despite different
>     places having identical names - this means that a lot of potential
>     matches are missing. Could something be done to improve the situation?
>
>
>     If Mix'n'match is incapable of automatically matching cases like this,
>     would it help if I did an automatic matching externally using some other
>     tool, and then gave the potential matches as e.g. a CSV file that could
>     then be imported into Mix'n'match so that they can be verified there?
>
>     -Osma
>
>     [1] https://tools.wmflabs.org/mix-n-match/#/catalog/473
>
>
>     Osma Suominen kirjoitti 17.06.2017 klo 13:13:
>      > Hi Magnus,
>      >
>      > Thanks a lot, that was fast! And the results look very good!
>      >
>      > I confirmed a couple dozen automated mapping and fixed an
>     incorrect one
>      > ("Amerikka" was matched to USA, but I changed it to "Americas").
>     Then I
>      > started hitting rate limit errors. I guess it would be possible
>     to avoid
>      > those with some extra permissions?
>      >
>      > About 20% of the places were automatically matched. Probably most
>     of the
>      > remaining ones - around 5000 - do not exist in Wikidata because
>     they are
>      > e.g. towns and villages in Finland. Would it be fair game to
>     create all
>      > of them in Wikidata?
>      >
>      > -Osma
>      >
>
>     --
>     Osma Suominen
>     D.Sc. (Tech), Information Systems Specialist
>     National Library of Finland
>     P.O. Box 26 (Kaikukatu 4)
>     00014 HELSINGIN YLIOPISTO
>     Tel. +358 50 3199529 <tel:+358%2050%203199529>
>     osma.suominen@helsinki.fi <mailto:osma.suominen@helsinki.fi>
>     http://www.nationallibrary.fi
>
>     _______________________________________________
>     Wikidata mailing list
>     Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
>     https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
>
> _______________________________________________
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>


--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suominen@helsinki.fi
http://www.nationallibrary.fi

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata