Hi Wikidatans,
After several delays we are finally starting to think seriously about mapping the General Finnish Ontology YSO [1] to Wikidata. A "YSO ID" property (https://www.wikidata.org/wiki/Property:P2347) was added to Wikidata some time ago, but it has been used only a few times so far.
Recently some 6000 places have been added to "YSO Places" [2], a new extension of YSO, which was generated from place names in YSA and Allärs, our earlier subject indexing vocabularies. It would probably make sense to map these places to Wikidata, in addition to the general concepts in YSO. We have already manually added a few links from YSA/YSO places to Wikidata for newly added places, but this approach does not scale if we want to link the thousands of existing places.
We also have some indirect sources of YSO/Wikidata mappings:
1. YSO is mapped to LCSH, and Wikidata also to LCSH (using P244, LC/NACO Authority File ID). I digged a bit into both sets of mappings and found that approximately 1200 YSO-Wikidata links could be generated from the intersection of these mappings.
2. The Finnish broadcasting company Yle has also created some mappings between KOKO (which includes YSO) and Wikidata. Last time I looked at those, we could generate at least 5000 YSO-Wikidata links from them. Probably more nowadays.
Of course, indirect mappings are a bit dangerous. It's possible that there are some differences in meaning, especially with LCSH which has a very different structure (and cultural context) than YSO. Nevertheless I think these could be a good starting point, especially if a tool such as Mix'n'Match could be used to verify them.
Now my question is, given that we already have or could easily generate thousands of Wikidata-YSO mappings, but the rest would still have to be semi-automatically linked using Mix'n'Match, what would be a good way to approach this? Does Mix'n'Match look at existing statements (in this case YSO ID / P2347) in Wikidata when you load a new catalog, or ignore them?
I can think of at least these approaches:
1. First import the indirect mappings we already have to Wikidata as P2347 statements, then create a Mix'n'Match catalog with the remaining YSO concepts. The indirect mappings would have to be verified separately.
2. First import the indirect mappings we already have to Wikidata as P2347 statements, then create a Mix'n'Match catalog with ALL the YSO concepts, including the ones for which we already have imported a mapping. Use Mix'n'Match to verify the indirect mappings.
3. Forget about the existing mappings and just create a Mix'n'Match catalog with all the YSO concepts.
Any advice?
Thanks,
-Osma
[2] http://finto.fi/yso-paikat/
Hi Osma,
just a few remarks:
* If you want to "seed" Mix'n'match with third-party/indirect IDs already in Wikidata, best to not create the catalog yourself, but mail me the data instead
* If you want "YSO places" in Wikidata, we will need a new property for that, unless the P2347 formatter URL would redirect automatically to "/yso-paikat/"
* You can create a Mix'n'match catalog before there is a property, and link them up later. The catalog will then synchronize
Cheers, Magnus
On Tue, Jun 6, 2017 at 11:19 AM Osma Suominen osma.suominen@helsinki.fi wrote:
Hi Wikidatans,
After several delays we are finally starting to think seriously about mapping the General Finnish Ontology YSO [1] to Wikidata. A "YSO ID" property (https://www.wikidata.org/wiki/Property:P2347) was added to Wikidata some time ago, but it has been used only a few times so far.
Recently some 6000 places have been added to "YSO Places" [2], a new extension of YSO, which was generated from place names in YSA and Allärs, our earlier subject indexing vocabularies. It would probably make sense to map these places to Wikidata, in addition to the general concepts in YSO. We have already manually added a few links from YSA/YSO places to Wikidata for newly added places, but this approach does not scale if we want to link the thousands of existing places.
We also have some indirect sources of YSO/Wikidata mappings:
- YSO is mapped to LCSH, and Wikidata also to LCSH (using P244, LC/NACO
Authority File ID). I digged a bit into both sets of mappings and found that approximately 1200 YSO-Wikidata links could be generated from the intersection of these mappings.
- The Finnish broadcasting company Yle has also created some mappings
between KOKO (which includes YSO) and Wikidata. Last time I looked at those, we could generate at least 5000 YSO-Wikidata links from them. Probably more nowadays.
Of course, indirect mappings are a bit dangerous. It's possible that there are some differences in meaning, especially with LCSH which has a very different structure (and cultural context) than YSO. Nevertheless I think these could be a good starting point, especially if a tool such as Mix'n'Match could be used to verify them.
Now my question is, given that we already have or could easily generate thousands of Wikidata-YSO mappings, but the rest would still have to be semi-automatically linked using Mix'n'Match, what would be a good way to approach this? Does Mix'n'Match look at existing statements (in this case YSO ID / P2347) in Wikidata when you load a new catalog, or ignore them?
I can think of at least these approaches:
- First import the indirect mappings we already have to Wikidata as
P2347 statements, then create a Mix'n'Match catalog with the remaining YSO concepts. The indirect mappings would have to be verified separately.
- First import the indirect mappings we already have to Wikidata as
P2347 statements, then create a Mix'n'Match catalog with ALL the YSO concepts, including the ones for which we already have imported a mapping. Use Mix'n'Match to verify the indirect mappings.
- Forget about the existing mappings and just create a Mix'n'Match
catalog with all the YSO concepts.
Any advice?
Thanks,
-Osma
[2] http://finto.fi/yso-paikat/
-- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box 26 (Kaikukatu 4) 00014 HELSINGIN YLIOPISTO Tel. +358 50 3199529 <+358%2050%203199529> osma.suominen@helsinki.fi http://www.nationallibrary.fi
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hi Magnus!
Thanks for your quick response. Comments inline.
Magnus Manske kirjoitti 06.06.2017 klo 15:57:
- If you want to "seed" Mix'n'match with third-party/indirect IDs
already in Wikidata, best to not create the catalog yourself, but mail me the data instead
Okay, great! What's the best format? The same as for creating catalogs, but with an additional Wikidata ID column with values from the existing mappings?
By the way, we also have multilingual labels that could perhaps improve the automatic matching. YSO generally has fi/sv/en, YSO places has fi/sv. Can you make use of these too if I provided them in additional columns?
- If you want "YSO places" in Wikidata, we will need a new property for
that, unless the P2347 formatter URL would redirect automatically to "/yso-paikat/"
It does redirect like this already. See e.g. http://www.yso.fi/onto/yso/p138653
- You can create a Mix'n'match catalog before there is a property, and
link them up later. The catalog will then synchronize
I don't think we need an additional property, but good to know anyway.
-Osma
On Tue, Jun 6, 2017 at 2:44 PM Osma Suominen osma.suominen@helsinki.fi wrote:
Hi Magnus!
Thanks for your quick response. Comments inline.
Magnus Manske kirjoitti 06.06.2017 klo 15:57:
- If you want to "seed" Mix'n'match with third-party/indirect IDs
already in Wikidata, best to not create the catalog yourself, but mail me the data instead
Okay, great! What's the best format? The same as for creating catalogs, but with an additional Wikidata ID column with values from the existing mappings?
That would work fine.
By the way, we also have multilingual labels that could perhaps improve the automatic matching. YSO generally has fi/sv/en, YSO places has fi/sv. Can you make use of these too if I provided them in additional columns?
Sorry, mix'n'match only does single language labels.
- If you want "YSO places" in Wikidata, we will need a new property for
that, unless the P2347 formatter URL would redirect automatically to "/yso-paikat/"
It does redirect like this already. See e.g. http://www.yso.fi/onto/yso/p138653
Great! So you could bunch the "old" ones and the new places into one list?
- You can create a Mix'n'match catalog before there is a property, and
link them up later. The catalog will then synchronize
I don't think we need an additional property, but good to know anyway.
-Osma
-- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box 26 (Kaikukatu 4) 00014 HELSINGIN YLIOPISTO Tel. +358 50 3199529 <+358%2050%203199529> osma.suominen@helsinki.fi http://www.nationallibrary.fi
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Magnus Manske kirjoitti 06.06.2017 klo 17:06:
By the way, we also have multilingual labels that could perhaps improve the automatic matching. YSO generally has fi/sv/en, YSO places has fi/sv. Can you make use of these too if I provided them in additional columns?
Sorry, mix'n'match only does single language labels.
Ok, then I have to think which language to pick for Mix'n'Match use. For YSO, Finnish and Swedish labels are generally the best quality, but probably wouldn't produce as many automated hits as the English ones. Also it depends on who is going to do the manual matching.
Any advice on this?
It does redirect like this already. See e.g. http://www.yso.fi/onto/yso/p138653
Great! So you could bunch the "old" ones and the new places into one list?
In principle yes, but in practice, I think it would make sense to use two lists, because the places are quite different from the general concepts. Also the matching could be more focused for the places - don't try to match with any Wikidata entity that is not a place.
-Osma
Would anyone be interested in creating a map interface for matching places in Mix'n'Match?
Just a thought...
Susanna
2017-06-06 17:17 GMT+03:00 Osma Suominen osma.suominen@helsinki.fi:
Magnus Manske kirjoitti 06.06.2017 klo 17:06:
By the way, we also have multilingual labels that could perhaps
improve the automatic matching. YSO generally has fi/sv/en, YSO places has fi/sv. Can you make use of these too if I provided them in additional columns?
Sorry, mix'n'match only does single language labels.
Ok, then I have to think which language to pick for Mix'n'Match use. For YSO, Finnish and Swedish labels are generally the best quality, but probably wouldn't produce as many automated hits as the English ones. Also it depends on who is going to do the manual matching.
Any advice on this?
It does redirect like this already. See e.g.
http://www.yso.fi/onto/yso/p138653
Great! So you could bunch the "old" ones and the new places into one list?
In principle yes, but in practice, I think it would make sense to use two lists, because the places are quite different from the general concepts. Also the matching could be more focused for the places - don't try to match with any Wikidata entity that is not a place.
-Osma
-- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box 26 (Kaikukatu 4) 00014 HELSINGIN YLIOPISTO Tel. +358 50 3199529 osma.suominen@helsinki.fi http://www.nationallibrary.fi
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
@Sandra: are you suggesting another layer on top of something like https://tools.wmflabs.org/wikishootme/ ?
Cheers,
Alex
On Tue, Jun 6, 2017 at 10:22 AM, Susanna Ånäs susanna.anas@gmail.com wrote:
Would anyone be interested in creating a map interface for matching places in Mix'n'Match?
Just a thought...
Susanna
2017-06-06 17:17 GMT+03:00 Osma Suominen osma.suominen@helsinki.fi:
Magnus Manske kirjoitti 06.06.2017 klo 17:06:
By the way, we also have multilingual labels that could perhaps
improve the automatic matching. YSO generally has fi/sv/en, YSO places has fi/sv. Can you make use of these too if I provided them in additional columns?
Sorry, mix'n'match only does single language labels.
Ok, then I have to think which language to pick for Mix'n'Match use. For YSO, Finnish and Swedish labels are generally the best quality, but probably wouldn't produce as many automated hits as the English ones. Also it depends on who is going to do the manual matching.
Any advice on this?
It does redirect like this already. See e.g.
http://www.yso.fi/onto/yso/p138653
Great! So you could bunch the "old" ones and the new places into one list?
In principle yes, but in practice, I think it would make sense to use two lists, because the places are quite different from the general concepts. Also the matching could be more focused for the places - don't try to match with any Wikidata entity that is not a place.
-Osma
-- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box 26 (Kaikukatu 4) 00014 HELSINGIN YLIOPISTO Tel. +358 50 3199529 osma.suominen@helsinki.fi http://www.nationallibrary.fi
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
I thought of something like this: https://drive.google.com/file/d/0BxuJSZymOK8-R1Q0SXpmVGk3dkE/view
Susanna
2017-06-06 19:21 GMT+03:00 Alex Stinson astinson@wikimedia.org:
@Sandra: are you suggesting another layer on top of something like https://tools.wmflabs.org/wikishootme/ ?
Cheers,
Alex
On Tue, Jun 6, 2017 at 10:22 AM, Susanna Ånäs susanna.anas@gmail.com wrote:
Would anyone be interested in creating a map interface for matching places in Mix'n'Match?
Just a thought...
Susanna
2017-06-06 17:17 GMT+03:00 Osma Suominen osma.suominen@helsinki.fi:
Magnus Manske kirjoitti 06.06.2017 klo 17:06:
By the way, we also have multilingual labels that could perhaps
improve the automatic matching. YSO generally has fi/sv/en, YSO places has fi/sv. Can you make use of these too if I provided them in additional columns?
Sorry, mix'n'match only does single language labels.
Ok, then I have to think which language to pick for Mix'n'Match use. For YSO, Finnish and Swedish labels are generally the best quality, but probably wouldn't produce as many automated hits as the English ones. Also it depends on who is going to do the manual matching.
Any advice on this?
It does redirect like this already. See e.g.
http://www.yso.fi/onto/yso/p138653
Great! So you could bunch the "old" ones and the new places into one list?
In principle yes, but in practice, I think it would make sense to use two lists, because the places are quite different from the general concepts. Also the matching could be more focused for the places - don't try to match with any Wikidata entity that is not a place.
-Osma
-- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box 26 (Kaikukatu 4) 00014 HELSINGIN YLIOPISTO Tel. +358 50 3199529 osma.suominen@helsinki.fi http://www.nationallibrary.fi
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
-- Alex Stinson GLAM-Wiki Strategist Wikimedia Foundation Twitter:@glamwiki/@sadads
Learn more about how the communities behind Wikipedia, Wikidata and other Wikimedia projects partner with cultural heritage organizations: http://glamwiki.org
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Does that imply coordinates in Mix'n'match? Because there is no support for that yet, though I could add it. Do you have an example catalog (existing or to-be-created)?
On Tue, Jun 6, 2017 at 6:30 PM Susanna Ånäs susanna.anas@gmail.com wrote:
I thought of something like this: https://drive.google.com/file/d/0BxuJSZymOK8-R1Q0SXpmVGk3dkE/view
Susanna
2017-06-06 19:21 GMT+03:00 Alex Stinson astinson@wikimedia.org:
@Sandra: are you suggesting another layer on top of something like https://tools.wmflabs.org/wikishootme/ ?
Cheers,
Alex
On Tue, Jun 6, 2017 at 10:22 AM, Susanna Ånäs susanna.anas@gmail.com wrote:
Would anyone be interested in creating a map interface for matching places in Mix'n'Match?
Just a thought...
Susanna
2017-06-06 17:17 GMT+03:00 Osma Suominen osma.suominen@helsinki.fi:
Magnus Manske kirjoitti 06.06.2017 klo 17:06:
By the way, we also have multilingual labels that could perhaps
improve the automatic matching. YSO generally has fi/sv/en, YSO places has fi/sv. Can you make use of these too if I provided them in additional columns?
Sorry, mix'n'match only does single language labels.
Ok, then I have to think which language to pick for Mix'n'Match use. For YSO, Finnish and Swedish labels are generally the best quality, but probably wouldn't produce as many automated hits as the English ones. Also it depends on who is going to do the manual matching.
Any advice on this?
It does redirect like this already. See e.g.
http://www.yso.fi/onto/yso/p138653
Great! So you could bunch the "old" ones and the new places into one list?
In principle yes, but in practice, I think it would make sense to use two lists, because the places are quite different from the general concepts. Also the matching could be more focused for the places - don't try to match with any Wikidata entity that is not a place.
-Osma
-- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box 26 (Kaikukatu 4) 00014 HELSINGIN YLIOPISTO Tel. +358 50 3199529 osma.suominen@helsinki.fi http://www.nationallibrary.fi
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
-- Alex Stinson GLAM-Wiki Strategist Wikimedia Foundation Twitter:@glamwiki/@sadads
Learn more about how the communities behind Wikipedia, Wikidata and other Wikimedia projects partner with cultural heritage organizations: http://glamwiki.org
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
07.06.2017, 13:10, Magnus Manske kirjoitti:
Does that imply coordinates in Mix'n'match? Because there is no support for that yet, though I could add it. Do you have an example catalog (existing or to-be-created)?
For YSO places, it would be possible to create a Mix'n'Match catalog where the majority of places have coordinates. YSO places doesn't itself contain coordinates, but the Finnish places within it have been mapped to the Place Name Registry (Paikannimirekisteri) maintained by National Land Survey of Finland (Maanmittauslaitos), which includes point coordinates for all places. So it would be possible to pick the coordinates from there for the 4400 or so places that have been mapped, if that helps with the linking in Mix'n'Match.
-Osma
We will also need a coordinate transformation since all official Finnish coordinates are in EPSG:3067. Before or in MixnMatch.
Susanna
2017-06-07 14:03 GMT+03:00 Osma Suominen osma.suominen@helsinki.fi:
07.06.2017, 13:10, Magnus Manske kirjoitti:
Does that imply coordinates in Mix'n'match? Because there is no support for that yet, though I could add it. Do you have an example catalog (existing or to-be-created)?
For YSO places, it would be possible to create a Mix'n'Match catalog where the majority of places have coordinates. YSO places doesn't itself contain coordinates, but the Finnish places within it have been mapped to the Place Name Registry (Paikannimirekisteri) maintained by National Land Survey of Finland (Maanmittauslaitos), which includes point coordinates for all places. So it would be possible to pick the coordinates from there for the 4400 or so places that have been mapped, if that helps with the linking in Mix'n'Match.
-Osma
-- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box 26 (Kaikukatu 4) 00014 HELSINGIN YLIOPISTO Tel. +358 50 3199529 osma.suominen@helsinki.fi http://www.nationallibrary.fi
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
I won't be getting into coordinate cleanup ;-)
Coordinates would have to be compatible with https://www.wikidata.org/wiki/Property:P625
On Wed, Jun 7, 2017 at 12:10 PM Susanna Ånäs susanna.anas@gmail.com wrote:
We will also need a coordinate transformation since all official Finnish coordinates are in EPSG:3067. Before or in MixnMatch.
Susanna
2017-06-07 14:03 GMT+03:00 Osma Suominen osma.suominen@helsinki.fi:
07.06.2017, 13:10, Magnus Manske kirjoitti:
Does that imply coordinates in Mix'n'match? Because there is no support for that yet, though I could add it. Do you have an example catalog (existing or to-be-created)?
For YSO places, it would be possible to create a Mix'n'Match catalog where the majority of places have coordinates. YSO places doesn't itself contain coordinates, but the Finnish places within it have been mapped to the Place Name Registry (Paikannimirekisteri) maintained by National Land Survey of Finland (Maanmittauslaitos), which includes point coordinates for all places. So it would be possible to pick the coordinates from there for the 4400 or so places that have been mapped, if that helps with the linking in Mix'n'Match.
-Osma
-- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box 26 (Kaikukatu 4) 00014 HELSINGIN YLIOPISTO Tel. +358 50 3199529 osma.suominen@helsinki.fi http://www.nationallibrary.fi
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
07.06.2017, 14:10, Susanna Ånäs kirjoitti:
We will also need a coordinate transformation since all official Finnish coordinates are in EPSG:3067. Before or in MixnMatch.
The (experimental) Linked Data service of NLS already provides WGS84 coordinates in addition to the official ones, so this should be easy.
-Osma
Hi Osma,
sorry for jumping in late. I've been at ELAG last week, talking about a very similar topic (Wikidata as authority linking hub, https://hackmd.io/p/S1YmXWC0e). Our use case was porting an existing mapping between RePEc author IDs and GND IDs into Wikidata (and furtheron extending it there). In that course, we had to match as many persons as possible on the GND as well as on the RePEc side (via Mix'n'match), before creating new items. The code used for preparing the (quickstatements2) insert statements is linked from the slides.
Additionally, I've added ~12,000 GND IDs to Wikidata via their existing VIAF identifiers (derived from a federated query on a custom VIAF endpoint and the public WD endpoint - https://github.com/zbw/sparql-queries/blob/master/viaf/missing_gnd_id_for_vi...). This sounds very similar to your use case; also another query which can derive future STW ID properties from the existing STW-GND mapping (https://github.com/zbw/sparql-queries/blob/master/stw/wikidata_mapping_candi... - currently hits a timeout at the WD subquery, but worked before). I would be happy if that could be helpful.
The plan to divide the m'n'm catalogs (places vs. subjects) makes sense for me, we plan the same for STW. I'm not sure, if a restriction to locations (Q17334923, or something more specific) will match also all subclasses, but Magnus could perhaps take care of that when you send him the files.
Cheers, Joachim
-----Ursprüngliche Nachricht----- Von: Wikidata [mailto:wikidata-bounces@lists.wikimedia.org] Im Auftrag von Osma Suominen Gesendet: Dienstag, 6. Juni 2017 12:19 An: Discussion list for the Wikidata project. Betreff: [Wikidata] Mix'n'Match with existing (indirect) mappings
Hi Wikidatans,
After several delays we are finally starting to think seriously about mapping the General Finnish Ontology YSO [1] to Wikidata. A "YSO ID" property (https://www.wikidata.org/wiki/Property:P2347) was added to Wikidata some time ago, but it has been used only a few times so far.
Recently some 6000 places have been added to "YSO Places" [2], a new extension of YSO, which was generated from place names in YSA and Allärs, our earlier subject indexing vocabularies. It would probably make sense to map these places to Wikidata, in addition to the general concepts in YSO. We have already manually added a few links from YSA/YSO places to Wikidata for newly added places, but this approach does not scale if we want to link the thousands of existing places.
We also have some indirect sources of YSO/Wikidata mappings:
- YSO is mapped to LCSH, and Wikidata also to LCSH (using P244, LC/NACO
Authority File ID). I digged a bit into both sets of mappings and found that approximately 1200 YSO-Wikidata links could be generated from the intersection of these mappings.
- The Finnish broadcasting company Yle has also created some mappings
between KOKO (which includes YSO) and Wikidata. Last time I looked at those, we could generate at least 5000 YSO-Wikidata links from them. Probably more nowadays.
Of course, indirect mappings are a bit dangerous. It's possible that there are some differences in meaning, especially with LCSH which has a very different structure (and cultural context) than YSO. Nevertheless I think these could be a good starting point, especially if a tool such as Mix'n'Match could be used to verify them.
Now my question is, given that we already have or could easily generate thousands of Wikidata-YSO mappings, but the rest would still have to be semi- automatically linked using Mix'n'Match, what would be a good way to approach this? Does Mix'n'Match look at existing statements (in this case YSO ID / P2347) in Wikidata when you load a new catalog, or ignore them?
I can think of at least these approaches:
- First import the indirect mappings we already have to Wikidata as
P2347 statements, then create a Mix'n'Match catalog with the remaining YSO concepts. The indirect mappings would have to be verified separately.
- First import the indirect mappings we already have to Wikidata as
P2347 statements, then create a Mix'n'Match catalog with ALL the YSO concepts, including the ones for which we already have imported a mapping. Use Mix'n'Match to verify the indirect mappings.
- Forget about the existing mappings and just create a Mix'n'Match catalog
with all the YSO concepts.
Any advice?
Thanks,
-Osma
[2] http://finto.fi/yso-paikat/
-- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box 26 (Kaikukatu 4) 00014 HELSINGIN YLIOPISTO Tel. +358 50 3199529 osma.suominen@helsinki.fi http://www.nationallibrary.fi
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Just to update everyone in this thread, I have added location support for Mix'n'match. This will show on entries with a location, e.g.:
https://tools.wmflabs.org/mix-n-match/#/entry/1655814
All Mix'n'match locations (just short of half a million at the moment) can be seen as a layer in WikiShootMe, e.g.:
Cheers, Magnus
On Tue, Jun 13, 2017 at 5:52 PM Neubert, Joachim J.Neubert@zbw.eu wrote:
Hi Osma,
sorry for jumping in late. I've been at ELAG last week, talking about a very similar topic (Wikidata as authority linking hub, https://hackmd.io/p/S1YmXWC0e). Our use case was porting an existing mapping between RePEc author IDs and GND IDs into Wikidata (and furtheron extending it there). In that course, we had to match as many persons as possible on the GND as well as on the RePEc side (via Mix'n'match), before creating new items. The code used for preparing the (quickstatements2) insert statements is linked from the slides.
Additionally, I've added ~12,000 GND IDs to Wikidata via their existing VIAF identifiers (derived from a federated query on a custom VIAF endpoint and the public WD endpoint - https://github.com/zbw/sparql-queries/blob/master/viaf/missing_gnd_id_for_vi...). This sounds very similar to your use case; also another query which can derive future STW ID properties from the existing STW-GND mapping ( https://github.com/zbw/sparql-queries/blob/master/stw/wikidata_mapping_candi...
- currently hits a timeout at the WD subquery, but worked before). I would
be happy if that could be helpful.
The plan to divide the m'n'm catalogs (places vs. subjects) makes sense for me, we plan the same for STW. I'm not sure, if a restriction to locations (Q17334923, or something more specific) will match also all subclasses, but Magnus could perhaps take care of that when you send him the files.
Cheers, Joachim
-----Ursprüngliche Nachricht----- Von: Wikidata [mailto:wikidata-bounces@lists.wikimedia.org] Im Auftrag
von
Osma Suominen Gesendet: Dienstag, 6. Juni 2017 12:19 An: Discussion list for the Wikidata project. Betreff: [Wikidata] Mix'n'Match with existing (indirect) mappings
Hi Wikidatans,
After several delays we are finally starting to think seriously about
mapping the
General Finnish Ontology YSO [1] to Wikidata. A "YSO ID" property (https://www.wikidata.org/wiki/Property:P2347) was added to Wikidata some time ago, but it has been used only a few times so far.
Recently some 6000 places have been added to "YSO Places" [2], a new extension of YSO, which was generated from place names in YSA and Allärs, our earlier subject indexing vocabularies. It would probably make sense
to map
these places to Wikidata, in addition to the general concepts in YSO. We
have
already manually added a few links from YSA/YSO places to Wikidata for
newly
added places, but this approach does not scale if we want to link the
thousands
of existing places.
We also have some indirect sources of YSO/Wikidata mappings:
- YSO is mapped to LCSH, and Wikidata also to LCSH (using P244, LC/NACO
Authority File ID). I digged a bit into both sets of mappings and found
that
approximately 1200 YSO-Wikidata links could be generated from the intersection of these mappings.
- The Finnish broadcasting company Yle has also created some mappings
between KOKO (which includes YSO) and Wikidata. Last time I looked at
those,
we could generate at least 5000 YSO-Wikidata links from them. Probably more nowadays.
Of course, indirect mappings are a bit dangerous. It's possible that
there are
some differences in meaning, especially with LCSH which has a very
different
structure (and cultural context) than YSO. Nevertheless I think these
could be a
good starting point, especially if a tool such as Mix'n'Match could be
used to
verify them.
Now my question is, given that we already have or could easily generate thousands of Wikidata-YSO mappings, but the rest would still have to be
semi-
automatically linked using Mix'n'Match, what would be a good way to approach this? Does Mix'n'Match look at existing statements (in this
case YSO
ID / P2347) in Wikidata when you load a new catalog, or ignore them?
I can think of at least these approaches:
- First import the indirect mappings we already have to Wikidata as
P2347 statements, then create a Mix'n'Match catalog with the remaining
YSO
concepts. The indirect mappings would have to be verified separately.
- First import the indirect mappings we already have to Wikidata as
P2347 statements, then create a Mix'n'Match catalog with ALL the YSO concepts, including the ones for which we already have imported a
mapping.
Use Mix'n'Match to verify the indirect mappings.
- Forget about the existing mappings and just create a Mix'n'Match
catalog
with all the YSO concepts.
Any advice?
Thanks,
-Osma
[2] http://finto.fi/yso-paikat/
-- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland
P.O. Box
26 (Kaikukatu 4) 00014 HELSINGIN YLIOPISTO Tel. +358 50 3199529 <+358%2050%203199529> osma.suominen@helsinki.fi http://www.nationallibrary.fi
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hi Magnus!
That's excellent news! Thanks a lot!
I'm currently preparing a CSV dump of YSO places. Most of the entries have coordinates. I will send it to you soon for inclusion as a catalog in Mix'n'match.
-Osma
Magnus Manske kirjoitti 16.06.2017 klo 00:00:
Just to update everyone in this thread, I have added location support for Mix'n'match. This will show on entries with a location, e.g.:
https://tools.wmflabs.org/mix-n-match/#/entry/1655814
All Mix'n'match locations (just short of half a million at the moment) can be seen as a layer in WikiShootMe, e.g.:
Cheers, Magnus
On Tue, Jun 13, 2017 at 5:52 PM Neubert, Joachim <J.Neubert@zbw.eu mailto:J.Neubert@zbw.eu> wrote:
Hi Osma, sorry for jumping in late. I've been at ELAG last week, talking about a very similar topic (Wikidata as authority linking hub, https://hackmd.io/p/S1YmXWC0e). Our use case was porting an existing mapping between RePEc author IDs and GND IDs into Wikidata (and furtheron extending it there). In that course, we had to match as many persons as possible on the GND as well as on the RePEc side (via Mix'n'match), before creating new items. The code used for preparing the (quickstatements2) insert statements is linked from the slides. Additionally, I've added ~12,000 GND IDs to Wikidata via their existing VIAF identifiers (derived from a federated query on a custom VIAF endpoint and the public WD endpoint - https://github.com/zbw/sparql-queries/blob/master/viaf/missing_gnd_id_for_viaf.rq). This sounds very similar to your use case; also another query which can derive future STW ID properties from the existing STW-GND mapping (https://github.com/zbw/sparql-queries/blob/master/stw/wikidata_mapping_candidates_via_gnd.rq - currently hits a timeout at the WD subquery, but worked before). I would be happy if that could be helpful. The plan to divide the m'n'm catalogs (places vs. subjects) makes sense for me, we plan the same for STW. I'm not sure, if a restriction to locations (Q17334923, or something more specific) will match also all subclasses, but Magnus could perhaps take care of that when you send him the files. Cheers, Joachim > -----Ursprüngliche Nachricht----- > Von: Wikidata [mailto:wikidata-bounces@lists.wikimedia.org <mailto:wikidata-bounces@lists.wikimedia.org>] Im Auftrag von > Osma Suominen > Gesendet: Dienstag, 6. Juni 2017 12:19 > An: Discussion list for the Wikidata project. > Betreff: [Wikidata] Mix'n'Match with existing (indirect) mappings > > Hi Wikidatans, > > After several delays we are finally starting to think seriously about mapping the > General Finnish Ontology YSO [1] to Wikidata. A "YSO ID" > property (https://www.wikidata.org/wiki/Property:P2347) was added to > Wikidata some time ago, but it has been used only a few times so far. > > Recently some 6000 places have been added to "YSO Places" [2], a new > extension of YSO, which was generated from place names in YSA and Allärs, > our earlier subject indexing vocabularies. It would probably make sense to map > these places to Wikidata, in addition to the general concepts in YSO. We have > already manually added a few links from YSA/YSO places to Wikidata for newly > added places, but this approach does not scale if we want to link the thousands > of existing places. > > We also have some indirect sources of YSO/Wikidata mappings: > > 1. YSO is mapped to LCSH, and Wikidata also to LCSH (using P244, LC/NACO > Authority File ID). I digged a bit into both sets of mappings and found that > approximately 1200 YSO-Wikidata links could be generated from the > intersection of these mappings. > > 2. The Finnish broadcasting company Yle has also created some mappings > between KOKO (which includes YSO) and Wikidata. Last time I looked at those, > we could generate at least 5000 YSO-Wikidata links from them. > Probably more nowadays. > > > Of course, indirect mappings are a bit dangerous. It's possible that there are > some differences in meaning, especially with LCSH which has a very different > structure (and cultural context) than YSO. Nevertheless I think these could be a > good starting point, especially if a tool such as Mix'n'Match could be used to > verify them. > > Now my question is, given that we already have or could easily generate > thousands of Wikidata-YSO mappings, but the rest would still have to be semi- > automatically linked using Mix'n'Match, what would be a good way to > approach this? Does Mix'n'Match look at existing statements (in this case YSO > ID / P2347) in Wikidata when you load a new catalog, or ignore them? > > I can think of at least these approaches: > > 1. First import the indirect mappings we already have to Wikidata as > P2347 statements, then create a Mix'n'Match catalog with the remaining YSO > concepts. The indirect mappings would have to be verified separately. > > 2. First import the indirect mappings we already have to Wikidata as > P2347 statements, then create a Mix'n'Match catalog with ALL the YSO > concepts, including the ones for which we already have imported a mapping. > Use Mix'n'Match to verify the indirect mappings. > > 3. Forget about the existing mappings and just create a Mix'n'Match catalog > with all the YSO concepts. > > Any advice? > > Thanks, > > -Osma > > [1] http://finto.fi/yso/ > > [2] http://finto.fi/yso-paikat/ > > -- > Osma Suominen > D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box > 26 (Kaikukatu 4) > 00014 HELSINGIN YLIOPISTO > Tel. +358 50 3199529 <tel:+358%2050%203199529> > osma.suominen@helsinki.fi <mailto:osma.suominen@helsinki.fi> > http://www.nationallibrary.fi > > _______________________________________________ > Wikidata mailing list > Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> > https://lists.wikimedia.org/mailman/listinfo/wikidata _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Now at https://tools.wmflabs.org/mix-n-match/#/catalog/473
Location data as well, example: https://tools.wmflabs.org/mix-n-match/#/entry/22733305
On Fri, Jun 16, 2017 at 2:40 PM Osma Suominen osma.suominen@helsinki.fi wrote:
Hi Magnus!
That's excellent news! Thanks a lot!
I'm currently preparing a CSV dump of YSO places. Most of the entries have coordinates. I will send it to you soon for inclusion as a catalog in Mix'n'match.
-Osma
Magnus Manske kirjoitti 16.06.2017 klo 00:00:
Just to update everyone in this thread, I have added location support for Mix'n'match. This will show on entries with a location, e.g.:
https://tools.wmflabs.org/mix-n-match/#/entry/1655814
All Mix'n'match locations (just short of half a million at the moment) can be seen as a layer in WikiShootMe, e.g.:
Cheers, Magnus
On Tue, Jun 13, 2017 at 5:52 PM Neubert, Joachim <J.Neubert@zbw.eu mailto:J.Neubert@zbw.eu> wrote:
Hi Osma, sorry for jumping in late. I've been at ELAG last week, talking about a very similar topic (Wikidata as authority linking hub, https://hackmd.io/p/S1YmXWC0e). Our use case was porting an existing mapping between RePEc author IDs and GND IDs into Wikidata (and furtheron extending it there). In that course, we had to match as many persons as possible on the GND as well as on the RePEc side (via Mix'n'match), before creating new items. The code used for preparing the (quickstatements2) insert statements is linked from the slides. Additionally, I've added ~12,000 GND IDs to Wikidata via their existing VIAF identifiers (derived from a federated query on a custom VIAF endpoint and the public WD endpoint -
https://github.com/zbw/sparql-queries/blob/master/viaf/missing_gnd_id_for_vi... ).
This sounds very similar to your use case; also another query which can derive future STW ID properties from the existing STW-GND mapping (
https://github.com/zbw/sparql-queries/blob/master/stw/wikidata_mapping_candi...
- currently hits a timeout at the WD subquery, but worked before). I would be happy if that could be helpful. The plan to divide the m'n'm catalogs (places vs. subjects) makes sense for me, we plan the same for STW. I'm not sure, if a restriction to locations (Q17334923, or something more specific) will match also all subclasses, but Magnus could perhaps take care of that when you send him the files. Cheers, Joachim > -----Ursprüngliche Nachricht----- > Von: Wikidata [mailto:wikidata-bounces@lists.wikimedia.org <mailto:wikidata-bounces@lists.wikimedia.org>] Im Auftrag von > Osma Suominen > Gesendet: Dienstag, 6. Juni 2017 12:19 > An: Discussion list for the Wikidata project. > Betreff: [Wikidata] Mix'n'Match with existing (indirect) mappings > > Hi Wikidatans, > > After several delays we are finally starting to think seriously about mapping the > General Finnish Ontology YSO [1] to Wikidata. A "YSO ID" > property (https://www.wikidata.org/wiki/Property:P2347) was
added to
> Wikidata some time ago, but it has been used only a few times so
far.
> > Recently some 6000 places have been added to "YSO Places" [2], a
new
> extension of YSO, which was generated from place names in YSA and Allärs, > our earlier subject indexing vocabularies. It would probably make sense to map > these places to Wikidata, in addition to the general concepts in YSO. We have > already manually added a few links from YSA/YSO places to Wikidata for newly > added places, but this approach does not scale if we want to link the thousands > of existing places. > > We also have some indirect sources of YSO/Wikidata mappings: > > 1. YSO is mapped to LCSH, and Wikidata also to LCSH (using P244, LC/NACO > Authority File ID). I digged a bit into both sets of mappings and found that > approximately 1200 YSO-Wikidata links could be generated from the > intersection of these mappings. > > 2. The Finnish broadcasting company Yle has also created some mappings > between KOKO (which includes YSO) and Wikidata. Last time I looked at those, > we could generate at least 5000 YSO-Wikidata links from them. > Probably more nowadays. > > > Of course, indirect mappings are a bit dangerous. It's possible that there are > some differences in meaning, especially with LCSH which has a very different > structure (and cultural context) than YSO. Nevertheless I think these could be a > good starting point, especially if a tool such as Mix'n'Match could be used to > verify them. > > Now my question is, given that we already have or could easily generate > thousands of Wikidata-YSO mappings, but the rest would still have to be semi- > automatically linked using Mix'n'Match, what would be a good way
to
> approach this? Does Mix'n'Match look at existing statements (in this case YSO > ID / P2347) in Wikidata when you load a new catalog, or ignore
them?
> > I can think of at least these approaches: > > 1. First import the indirect mappings we already have to Wikidata
as
> P2347 statements, then create a Mix'n'Match catalog with the remaining YSO > concepts. The indirect mappings would have to be verified
separately.
> > 2. First import the indirect mappings we already have to Wikidata
as
> P2347 statements, then create a Mix'n'Match catalog with ALL the
YSO
> concepts, including the ones for which we already have imported a mapping. > Use Mix'n'Match to verify the indirect mappings. > > 3. Forget about the existing mappings and just create a Mix'n'Match catalog > with all the YSO concepts. > > Any advice? > > Thanks, > > -Osma > > [1] http://finto.fi/yso/ > > [2] http://finto.fi/yso-paikat/ > > -- > Osma Suominen > D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box > 26 (Kaikukatu 4) > 00014 HELSINGIN YLIOPISTO > Tel. +358 50 3199529 <+358%2050%203199529>
> osma.suominen@helsinki.fi <mailto:osma.suominen@helsinki.fi> > http://www.nationallibrary.fi > > _______________________________________________ > Wikidata mailing list > Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
-- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box 26 (Kaikukatu 4) 00014 HELSINGIN YLIOPISTO Tel. +358 50 3199529 <+358%2050%203199529> osma.suominen@helsinki.fi http://www.nationallibrary.fi
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hi Magnus,
Thanks a lot, that was fast! And the results look very good!
I confirmed a couple dozen automated mapping and fixed an incorrect one ("Amerikka" was matched to USA, but I changed it to "Americas"). Then I started hitting rate limit errors. I guess it would be possible to avoid those with some extra permissions?
About 20% of the places were automatically matched. Probably most of the remaining ones - around 5000 - do not exist in Wikidata because they are e.g. towns and villages in Finland. Would it be fair game to create all of them in Wikidata?
-Osma
Magnus Manske kirjoitti 16.06.2017 klo 20:07:
Now at https://tools.wmflabs.org/mix-n-match/#/catalog/473
Location data as well, example: https://tools.wmflabs.org/mix-n-match/#/entry/22733305
On Fri, Jun 16, 2017 at 2:40 PM Osma Suominen <osma.suominen@helsinki.fi mailto:osma.suominen@helsinki.fi> wrote:
Hi Magnus! That's excellent news! Thanks a lot! I'm currently preparing a CSV dump of YSO places. Most of the entries have coordinates. I will send it to you soon for inclusion as a catalog in Mix'n'match. -Osma Magnus Manske kirjoitti 16.06.2017 klo 00:00: > Just to update everyone in this thread, I have added location support > for Mix'n'match. This will show on entries with a location, e.g.: > > https://tools.wmflabs.org/mix-n-match/#/entry/1655814 > > All Mix'n'match locations (just short of half a million at the moment) > can be seen as a layer in WikiShootMe, e.g.: > > https://goo.gl/kqfjoj > > Cheers, > Magnus > > On Tue, Jun 13, 2017 at 5:52 PM Neubert, Joachim <J.Neubert@zbw.eu <mailto:J.Neubert@zbw.eu> > <mailto:J.Neubert@zbw.eu <mailto:J.Neubert@zbw.eu>>> wrote: > > Hi Osma, > > sorry for jumping in late. I've been at ELAG last week, talking > about a very similar topic (Wikidata as authority linking hub, > https://hackmd.io/p/S1YmXWC0e). Our use case was porting an existing > mapping between RePEc author IDs and GND IDs into Wikidata (and > furtheron extending it there). In that course, we had to match as > many persons as possible on the GND as well as on the RePEc side > (via Mix'n'match), before creating new items. The code used for > preparing the (quickstatements2) insert statements is linked from > the slides. > > Additionally, I've added ~12,000 GND IDs to Wikidata via their > existing VIAF identifiers (derived from a federated query on a > custom VIAF endpoint and the public WD endpoint - > https://github.com/zbw/sparql-queries/blob/master/viaf/missing_gnd_id_for_viaf.rq). > This sounds very similar to your use case; also another query which > can derive future STW ID properties from the existing STW-GND > mapping > (https://github.com/zbw/sparql-queries/blob/master/stw/wikidata_mapping_candidates_via_gnd.rq > - currently hits a timeout at the WD subquery, but worked before). I > would be happy if that could be helpful. > > The plan to divide the m'n'm catalogs (places vs. subjects) makes > sense for me, we plan the same for STW. I'm not sure, if a > restriction to locations (Q17334923, or something more specific) > will match also all subclasses, but Magnus could perhaps take care > of that when you send him the files. > > Cheers, Joachim > > > -----Ursprüngliche Nachricht----- > > Von: Wikidata [mailto:wikidata-bounces@lists.wikimedia.org <mailto:wikidata-bounces@lists.wikimedia.org> > <mailto:wikidata-bounces@lists.wikimedia.org <mailto:wikidata-bounces@lists.wikimedia.org>>] Im Auftrag von > > Osma Suominen > > Gesendet: Dienstag, 6. Juni 2017 12:19 > > An: Discussion list for the Wikidata project. > > Betreff: [Wikidata] Mix'n'Match with existing (indirect) mappings > > > > Hi Wikidatans, > > > > After several delays we are finally starting to think seriously > about mapping the > > General Finnish Ontology YSO [1] to Wikidata. A "YSO ID" > > property (https://www.wikidata.org/wiki/Property:P2347) was added to > > Wikidata some time ago, but it has been used only a few times so far. > > > > Recently some 6000 places have been added to "YSO Places" [2], a new > > extension of YSO, which was generated from place names in YSA and > Allärs, > > our earlier subject indexing vocabularies. It would probably make > sense to map > > these places to Wikidata, in addition to the general concepts in > YSO. We have > > already manually added a few links from YSA/YSO places to > Wikidata for newly > > added places, but this approach does not scale if we want to link > the thousands > > of existing places. > > > > We also have some indirect sources of YSO/Wikidata mappings: > > > > 1. YSO is mapped to LCSH, and Wikidata also to LCSH (using P244, > LC/NACO > > Authority File ID). I digged a bit into both sets of mappings and > found that > > approximately 1200 YSO-Wikidata links could be generated from the > > intersection of these mappings. > > > > 2. The Finnish broadcasting company Yle has also created some > mappings > > between KOKO (which includes YSO) and Wikidata. Last time I > looked at those, > > we could generate at least 5000 YSO-Wikidata links from them. > > Probably more nowadays. > > > > > > Of course, indirect mappings are a bit dangerous. It's possible > that there are > > some differences in meaning, especially with LCSH which has a > very different > > structure (and cultural context) than YSO. Nevertheless I think > these could be a > > good starting point, especially if a tool such as Mix'n'Match > could be used to > > verify them. > > > > Now my question is, given that we already have or could easily > generate > > thousands of Wikidata-YSO mappings, but the rest would still have > to be semi- > > automatically linked using Mix'n'Match, what would be a good way to > > approach this? Does Mix'n'Match look at existing statements (in > this case YSO > > ID / P2347) in Wikidata when you load a new catalog, or ignore them? > > > > I can think of at least these approaches: > > > > 1. First import the indirect mappings we already have to Wikidata as > > P2347 statements, then create a Mix'n'Match catalog with the > remaining YSO > > concepts. The indirect mappings would have to be verified separately. > > > > 2. First import the indirect mappings we already have to Wikidata as > > P2347 statements, then create a Mix'n'Match catalog with ALL the YSO > > concepts, including the ones for which we already have imported a > mapping. > > Use Mix'n'Match to verify the indirect mappings. > > > > 3. Forget about the existing mappings and just create a > Mix'n'Match catalog > > with all the YSO concepts. > > > > Any advice? > > > > Thanks, > > > > -Osma > > > > [1] http://finto.fi/yso/ > > > > [2] http://finto.fi/yso-paikat/ > > > > -- > > Osma Suominen > > D.Sc. (Tech), Information Systems Specialist National Library of > Finland P.O. Box > > 26 (Kaikukatu 4) > > 00014 HELSINGIN YLIOPISTO > > Tel. +358 50 3199529 <tel:+358%2050%203199529> <tel:+358%2050%203199529> > > osma.suominen@helsinki.fi <mailto:osma.suominen@helsinki.fi> <mailto:osma.suominen@helsinki.fi <mailto:osma.suominen@helsinki.fi>> > > http://www.nationallibrary.fi > > > > _______________________________________________ > > Wikidata mailing list > > Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> <mailto:Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>> > > https://lists.wikimedia.org/mailman/listinfo/wikidata > _______________________________________________ > Wikidata mailing list > Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> <mailto:Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>> > https://lists.wikimedia.org/mailman/listinfo/wikidata > > > > _______________________________________________ > Wikidata mailing list > Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> > https://lists.wikimedia.org/mailman/listinfo/wikidata > -- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box 26 (Kaikukatu 4) 00014 HELSINGIN YLIOPISTO Tel. +358 50 3199529 <tel:+358%2050%203199529> osma.suominen@helsinki.fi <mailto:osma.suominen@helsinki.fi> http://www.nationallibrary.fi _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hi Magnus, all,
I've been looking a bit closer at the YSO places catalog [1] in Mix'n'match and I'm wondering why only 20% of the places were automatically matched.
For example, Nepal (http://www.yso.fi/onto/yso/p107682) was automatically matched to Nepal (Q837).
But:
Accra (http://www.yso.fi/onto/yso/p138653) was not matched to Accra (Q3761).
Aceh (http://www.yso.fi/onto/yso/p147889) was not matched to Aceh (Q1823).
Akkunusjoki (http://www.yso.fi/onto/yso/p109251) was not matched to Akkunusjoki (Q12253027).
There are many more cases like this. So the precision of the automatic matching seems good (all but one were correct so far), but the recall is rather low, and even in cases where the label is identical a match has not been suggested. Is there anything that could be done about this?
Somewhat related to this, it seems that none of the places with parenthetical qualifiers in their names were matched. For example "Ahjo (Kerava)" could have been matched to Q11849902 (which has a Finnish label that is identical) and "Ala-Malmi (Helsinki)" could have been matched to Q2829441 ("Ala-Malmi"). Since almost 60% of the place names include parenthetical qualifiers - to make them unique despite different places having identical names - this means that a lot of potential matches are missing. Could something be done to improve the situation?
If Mix'n'match is incapable of automatically matching cases like this, would it help if I did an automatic matching externally using some other tool, and then gave the potential matches as e.g. a CSV file that could then be imported into Mix'n'match so that they can be verified there?
-Osma
[1] https://tools.wmflabs.org/mix-n-match/#/catalog/473
Osma Suominen kirjoitti 17.06.2017 klo 13:13:
Hi Magnus,
Thanks a lot, that was fast! And the results look very good!
I confirmed a couple dozen automated mapping and fixed an incorrect one ("Amerikka" was matched to USA, but I changed it to "Americas"). Then I started hitting rate limit errors. I guess it would be possible to avoid those with some extra permissions?
About 20% of the places were automatically matched. Probably most of the remaining ones - around 5000 - do not exist in Wikidata because they are e.g. towns and villages in Finland. Would it be fair game to create all of them in Wikidata?
-Osma
I fiddled with it a bit, now 35% automatched.
Will try some more, but there are some sanity constraints on the matching. If it finds more than one match for the name, it does not set any match, because random matches on the same name were annoying in the past. There is also a type constraint, which might skip some Wikidata items without appropriate instance/subclass.
On Mon, Jun 19, 2017 at 8:09 AM Osma Suominen osma.suominen@helsinki.fi wrote:
Hi Magnus, all,
I've been looking a bit closer at the YSO places catalog [1] in Mix'n'match and I'm wondering why only 20% of the places were automatically matched.
For example, Nepal (http://www.yso.fi/onto/yso/p107682) was automatically matched to Nepal (Q837).
But:
Accra (http://www.yso.fi/onto/yso/p138653) was not matched to Accra (Q3761).
Aceh (http://www.yso.fi/onto/yso/p147889) was not matched to Aceh (Q1823).
Akkunusjoki (http://www.yso.fi/onto/yso/p109251) was not matched to Akkunusjoki (Q12253027).
There are many more cases like this. So the precision of the automatic matching seems good (all but one were correct so far), but the recall is rather low, and even in cases where the label is identical a match has not been suggested. Is there anything that could be done about this?
Somewhat related to this, it seems that none of the places with parenthetical qualifiers in their names were matched. For example "Ahjo (Kerava)" could have been matched to Q11849902 (which has a Finnish label that is identical) and "Ala-Malmi (Helsinki)" could have been matched to Q2829441 ("Ala-Malmi"). Since almost 60% of the place names include parenthetical qualifiers - to make them unique despite different places having identical names - this means that a lot of potential matches are missing. Could something be done to improve the situation?
If Mix'n'match is incapable of automatically matching cases like this, would it help if I did an automatic matching externally using some other tool, and then gave the potential matches as e.g. a CSV file that could then be imported into Mix'n'match so that they can be verified there?
-Osma
[1] https://tools.wmflabs.org/mix-n-match/#/catalog/473
Osma Suominen kirjoitti 17.06.2017 klo 13:13:
Hi Magnus,
Thanks a lot, that was fast! And the results look very good!
I confirmed a couple dozen automated mapping and fixed an incorrect one ("Amerikka" was matched to USA, but I changed it to "Americas"). Then I started hitting rate limit errors. I guess it would be possible to avoid those with some extra permissions?
About 20% of the places were automatically matched. Probably most of the remaining ones - around 5000 - do not exist in Wikidata because they are e.g. towns and villages in Finland. Would it be fair game to create all of them in Wikidata?
-Osma
-- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box 26 (Kaikukatu 4) 00014 HELSINGIN YLIOPISTO Tel. +358 50 3199529 <+358%2050%203199529> osma.suominen@helsinki.fi http://www.nationallibrary.fi
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hi Magnus!
It's even higher now - 45%. Thanks a lot! This helps a lot with the verifying.
Also matching of names with parenthetical qualifiers works better now. I see that "Ala-Malmi (Helsinki)" was automatched to "Ala-Malmi". However, "Ahjo (Kerava)" was not matched to "Ahjo (Kerava)" (Q11849902) but to Q1368573 (which is "Ahjo" in Finnish but means a type of metalworking workshop, not a specific place). Neither Wikidata entity has a type statement, the latter has "subclass-of <workshop>" statement.
In any case, I think this is now good enough for serious work, so we will start verifying the suggested matches. 2.5% (173) already done...
-Osma
Magnus Manske kirjoitti 19.06.2017 klo 12:02:
I fiddled with it a bit, now 35% automatched.
Will try some more, but there are some sanity constraints on the matching. If it finds more than one match for the name, it does not set any match, because random matches on the same name were annoying in the past. There is also a type constraint, which might skip some Wikidata items without appropriate instance/subclass.
On Mon, Jun 19, 2017 at 8:09 AM Osma Suominen <osma.suominen@helsinki.fi mailto:osma.suominen@helsinki.fi> wrote:
Hi Magnus, all, I've been looking a bit closer at the YSO places catalog [1] in Mix'n'match and I'm wondering why only 20% of the places were automatically matched. For example, Nepal (http://www.yso.fi/onto/yso/p107682) was automatically matched to Nepal (Q837). But: Accra (http://www.yso.fi/onto/yso/p138653) was not matched to Accra (Q3761). Aceh (http://www.yso.fi/onto/yso/p147889) was not matched to Aceh (Q1823). Akkunusjoki (http://www.yso.fi/onto/yso/p109251) was not matched to Akkunusjoki (Q12253027). There are many more cases like this. So the precision of the automatic matching seems good (all but one were correct so far), but the recall is rather low, and even in cases where the label is identical a match has not been suggested. Is there anything that could be done about this? Somewhat related to this, it seems that none of the places with parenthetical qualifiers in their names were matched. For example "Ahjo (Kerava)" could have been matched to Q11849902 (which has a Finnish label that is identical) and "Ala-Malmi (Helsinki)" could have been matched to Q2829441 ("Ala-Malmi"). Since almost 60% of the place names include parenthetical qualifiers - to make them unique despite different places having identical names - this means that a lot of potential matches are missing. Could something be done to improve the situation? If Mix'n'match is incapable of automatically matching cases like this, would it help if I did an automatic matching externally using some other tool, and then gave the potential matches as e.g. a CSV file that could then be imported into Mix'n'match so that they can be verified there? -Osma [1] https://tools.wmflabs.org/mix-n-match/#/catalog/473 Osma Suominen kirjoitti 17.06.2017 klo 13:13: > Hi Magnus, > > Thanks a lot, that was fast! And the results look very good! > > I confirmed a couple dozen automated mapping and fixed an incorrect one > ("Amerikka" was matched to USA, but I changed it to "Americas"). Then I > started hitting rate limit errors. I guess it would be possible to avoid > those with some extra permissions? > > About 20% of the places were automatically matched. Probably most of the > remaining ones - around 5000 - do not exist in Wikidata because they are > e.g. towns and villages in Finland. Would it be fair game to create all > of them in Wikidata? > > -Osma > -- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box 26 (Kaikukatu 4) 00014 HELSINGIN YLIOPISTO Tel. +358 50 3199529 <tel:+358%2050%203199529> osma.suominen@helsinki.fi <mailto:osma.suominen@helsinki.fi> http://www.nationallibrary.fi _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
For "casual matching", try the game mode: https://tools.wmflabs.org/mix-n-match/#/random/473
On Mon, Jun 19, 2017 at 10:16 AM Osma Suominen osma.suominen@helsinki.fi wrote:
Hi Magnus!
It's even higher now - 45%. Thanks a lot! This helps a lot with the verifying.
Also matching of names with parenthetical qualifiers works better now. I see that "Ala-Malmi (Helsinki)" was automatched to "Ala-Malmi". However, "Ahjo (Kerava)" was not matched to "Ahjo (Kerava)" (Q11849902) but to Q1368573 (which is "Ahjo" in Finnish but means a type of metalworking workshop, not a specific place). Neither Wikidata entity has a type statement, the latter has "subclass-of <workshop>" statement.
In any case, I think this is now good enough for serious work, so we will start verifying the suggested matches. 2.5% (173) already done...
-Osma
Magnus Manske kirjoitti 19.06.2017 klo 12:02:
I fiddled with it a bit, now 35% automatched.
Will try some more, but there are some sanity constraints on the matching. If it finds more than one match for the name, it does not set any match, because random matches on the same name were annoying in the past. There is also a type constraint, which might skip some Wikidata items without appropriate instance/subclass.
On Mon, Jun 19, 2017 at 8:09 AM Osma Suominen <osma.suominen@helsinki.fi mailto:osma.suominen@helsinki.fi> wrote:
Hi Magnus, all, I've been looking a bit closer at the YSO places catalog [1] in Mix'n'match and I'm wondering why only 20% of the places were automatically matched. For example, Nepal (http://www.yso.fi/onto/yso/p107682) was automatically matched to Nepal (Q837). But: Accra (http://www.yso.fi/onto/yso/p138653) was not matched to Accra (Q3761). Aceh (http://www.yso.fi/onto/yso/p147889) was not matched to Aceh (Q1823). Akkunusjoki (http://www.yso.fi/onto/yso/p109251) was not matched to Akkunusjoki (Q12253027). There are many more cases like this. So the precision of the
automatic
matching seems good (all but one were correct so far), but the
recall is
rather low, and even in cases where the label is identical a match
has
not been suggested. Is there anything that could be done about this? Somewhat related to this, it seems that none of the places with parenthetical qualifiers in their names were matched. For example
"Ahjo
(Kerava)" could have been matched to Q11849902 (which has a Finnish label that is identical) and "Ala-Malmi (Helsinki)" could have been matched to Q2829441 ("Ala-Malmi"). Since almost 60% of the place
names
include parenthetical qualifiers - to make them unique despite
different
places having identical names - this means that a lot of potential matches are missing. Could something be done to improve the
situation?
If Mix'n'match is incapable of automatically matching cases like
this,
would it help if I did an automatic matching externally using some
other
tool, and then gave the potential matches as e.g. a CSV file that
could
then be imported into Mix'n'match so that they can be verified there? -Osma [1] https://tools.wmflabs.org/mix-n-match/#/catalog/473 Osma Suominen kirjoitti 17.06.2017 klo 13:13: > Hi Magnus, > > Thanks a lot, that was fast! And the results look very good! > > I confirmed a couple dozen automated mapping and fixed an incorrect one > ("Amerikka" was matched to USA, but I changed it to "Americas"). Then I > started hitting rate limit errors. I guess it would be possible to avoid > those with some extra permissions? > > About 20% of the places were automatically matched. Probably most of the > remaining ones - around 5000 - do not exist in Wikidata because they are > e.g. towns and villages in Finland. Would it be fair game to create all > of them in Wikidata? > > -Osma > -- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box 26 (Kaikukatu 4) 00014 HELSINGIN YLIOPISTO Tel. +358 50 3199529 <+358%2050%203199529> <tel:+358%2050%203199529> osma.suominen@helsinki.fi <mailto:osma.suominen@helsinki.fi> http://www.nationallibrary.fi _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
-- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box 26 (Kaikukatu 4) 00014 HELSINGIN YLIOPISTO Tel. +358 50 3199529 <+358%2050%203199529> osma.suominen@helsinki.fi http://www.nationallibrary.fi
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Magnus Manske kirjoitti 19.06.2017 klo 13:54:
For "casual matching", try the game mode: https://tools.wmflabs.org/mix-n-match/#/random/473
Thanks, I already tried all the modes. They are good for different purposes. The manual mode seems most efficient for verifying the automated matches, most of which can just be confirmed with a single click without reloading a whole page, but the game modes are better for handling the unmatched ones since they provide some fuzzier suggestions without having to do manual searches.
I couldn't see the "not on Wikidata" button that was mentioned in the manual in any of the modes. Has it been removed? It would be useful to be able to mark that something is not (yet) in Wikidata, though I suppose it could be added by someone else at any time, so this type of information may become obsolete over time.
In any case we need to decide whether to add all the places (e.g. villages and small lakes) that are not yet in Wikidata as new entities or not. Is there any guidance on this? I know the notability guidelines [1], but they are rather vague.
For most of the places we would like to add, there is at least one other public source - the Finnish place names registry, which contains information such as names, type, administrative hierarchy and coordinates - even though it is currently not linked to Wikidata in any way. And since this set of places is originally based on a library authority file that is maintained based on indexing needs, there should be at least one document about each place in libraries, archives and/or museum collections. So every place we have is at least slightly notable, but I'm not sure whether that's notable enough for Wikidata.
-Osma
[1] https://www.wikidata.org/wiki/Wikidata:Notability
On Mon, Jun 19, 2017 at 12:16 PM Osma Suominen osma.suominen@helsinki.fi wrote:
I couldn't see the "not on Wikidata" button that was mentioned in the manual in any of the modes. Has it been removed? It would be useful to be able to mark that something is not (yet) in Wikidata, though I suppose it could be added by someone else at any time, so this type of information may become obsolete over time.
That was indeed removed, as it takes a long time to finish large catalogs
(years), and by that time new items may have been created, so all the "not in Wikidata" entries have to be checked again.
My official policy now is to create a new item if one does not exist; the fact that there is an entry in a (good) third-party catalog alone makes them notable on Wikidata, but villages and lakes etc. are also notable by default.
Magnus Manske kirjoitti 19.06.2017 klo 14:58:
My official policy now is to create a new item if one does not exist; the fact that there is an entry in a (good) third-party catalog alone makes them notable on Wikidata, but villages and lakes etc. are also notable by default.
Thanks, I guess this is as official as it gets :) We will follow your advice and simply create new entities as necessary.
-Osma
In Freebase, back in the day, we also created new entities for the same reasons as Magnus gives. We found that just creating an entity and having potentially duplicate entities created less problems than not having any entity. We later just dealt with duplicate entities through simple human merge requests. Duplicate entities ended up being a very very minor occurrence after we improved our search algorithms to account for popularity as well as entities that had more than 1 filled out property.
In the case of non-mission critical datasets...More data, even duplicate, is better than no data at all.
-Thad +ThadGuidry https://www.google.com/+ThadGuidry