Hi Magnus!
It's even higher now - 45%. Thanks a lot! This helps a lot with the
verifying.
Also matching of names with parenthetical qualifiers works better now. I
see that "Ala-Malmi (Helsinki)" was automatched to "Ala-Malmi". However,
"Ahjo (Kerava)" was not matched to "Ahjo (Kerava)" (Q11849902) but to
Q1368573 (which is "Ahjo" in Finnish but means a type of metalworking
workshop, not a specific place). Neither Wikidata entity has a type
statement, the latter has "subclass-of <workshop>" statement.
In any case, I think this is now good enough for serious work, so we
will start verifying the suggested matches. 2.5% (173) already done...
-Osma
Magnus Manske kirjoitti 19.06.2017 klo 12:02:
> I fiddled with it a bit, now 35% automatched.
>
> Will try some more, but there are some sanity constraints on the
> matching. If it finds more than one match for the name, it does not set
> any match, because random matches on the same name were annoying in the
> past. There is also a type constraint, which might skip some Wikidata
> items without appropriate instance/subclass.
>
> On Mon, Jun 19, 2017 at 8:09 AM Osma Suominen <osma.suominen@helsinki.fi
> <mailto:osma.suominen@helsinki.fi>> wrote:
>
> Hi Magnus, all,
>
> I've been looking a bit closer at the YSO places catalog [1] in
> Mix'n'match and I'm wondering why only 20% of the places were
> automatically matched.
>
> For example, Nepal (http://www.yso.fi/onto/yso/p107682) was
> automatically matched to Nepal (Q837).
>
> But:
>
> Accra (http://www.yso.fi/onto/yso/p138653) was not matched to Accra
> (Q3761).
>
> Aceh (http://www.yso.fi/onto/yso/p147889) was not matched to Aceh
> (Q1823).
>
> Akkunusjoki (http://www.yso.fi/onto/yso/p109251) was not matched to
> Akkunusjoki (Q12253027).
>
> There are many more cases like this. So the precision of the automatic
> matching seems good (all but one were correct so far), but the recall is
> rather low, and even in cases where the label is identical a match has
> not been suggested. Is there anything that could be done about this?
>
>
> Somewhat related to this, it seems that none of the places with
> parenthetical qualifiers in their names were matched. For example "Ahjo
> (Kerava)" could have been matched to Q11849902 (which has a Finnish
> label that is identical) and "Ala-Malmi (Helsinki)" could have been
> matched to Q2829441 ("Ala-Malmi"). Since almost 60% of the place names
> include parenthetical qualifiers - to make them unique despite different
> places having identical names - this means that a lot of potential
> matches are missing. Could something be done to improve the situation?
>
>
> If Mix'n'match is incapable of automatically matching cases like this,
> would it help if I did an automatic matching externally using some other
> tool, and then gave the potential matches as e.g. a CSV file that could
> then be imported into Mix'n'match so that they can be verified there?
>
> -Osma
>
> [1] https://tools.wmflabs.org/mix-n-match/#/catalog/473
>
>
> Osma Suominen kirjoitti 17.06.2017 klo 13:13:
> > Hi Magnus,
> >
> > Thanks a lot, that was fast! And the results look very good!
> >
> > I confirmed a couple dozen automated mapping and fixed an
> incorrect one
> > ("Amerikka" was matched to USA, but I changed it to "Americas").
> Then I
> > started hitting rate limit errors. I guess it would be possible
> to avoid
> > those with some extra permissions?
> >
> > About 20% of the places were automatically matched. Probably most
> of the
> > remaining ones - around 5000 - do not exist in Wikidata because
> they are
> > e.g. towns and villages in Finland. Would it be fair game to
> create all
> > of them in Wikidata?
> >
> > -Osma
> >
>
> --
> Osma Suominen
> D.Sc. (Tech), Information Systems Specialist
> National Library of Finland
> P.O. Box 26 (Kaikukatu 4)
> 00014 HELSINGIN YLIOPISTO
> Tel. +358 50 3199529 <tel:+358%2050%203199529>
> osma.suominen@helsinki.fi <mailto:osma.suominen@helsinki.fi>
> http://www.nationallibrary.fi
>
> _______________________________________________
> Wikidata mailing list
> Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
>
> _______________________________________________
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suominen@helsinki.fi
http://www.nationallibrary.fi
_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata