Hi,
now that SQID supports the confirmation/rejection of statements from Primary Sources (Freebase imports), I notice certain systematic issues with it. I believe many of the proposals should be removed because they are already represented in Wikidata and do not need to be imported.
Three types of data I found so far:
(1) Redundant "located in the administrative territorial entity"/"contains administrative territorial entity". Wikidata stores only the next territory above/below the current one in these relations. PS often suggests territories reachable through several steps instead.
Examples: - https://tools.wmflabs.org/sqid/#/view?id=Q980 (login first to see suggestions). There are almost 100 towns that fall into this area suggested here, but they all should be organised in more specific sub-regions of the hierarchy. - https://tools.wmflabs.org/sqid/#/view?id=Q10474 There is a higher-level territory suggested here (Bavaria) even though "Lower Bavaria" is already present.
Similar things are found, e.g., for occupation (P106), where a person that is already a "sport cyclist" might be suggested to be a "sportsperson".
(2) Syntactic variations of the "same" value. Typical cases are URLs, which PS suggests with trailing "/" even after top-level domains, while Wikidata often omits it. This means you have suggestions like "http://www.pirna.de/" when there is already "http://www.pirna.de".
(https://tools.wmflabs.org/sqid/#/view?id=Q6477)
(3) Redirect items as values. PS sometimes suggests statement values that are redirects to other entities, for which there already is a statement.
All of these cases should be fixed on the provider side, not by hiding suggestions in the UI (as it seems to be done by the PS gadget for case (2)). This would also help to get better statistics: right now, all I can do is to reject all of these values, but this might be misleading if one looks at the PS statistics since they are not wrong, but simply unnecessary.
Simply hiding suggestions that are not eliminated from the data also makes the PS service's feature for finding items with suggestions much less useful (you might find items that does not show you any suggestion).
I was wondering if anybody is still working on PS clean up now or if this part of the project this orphaned.
Cheers,
Markus
Hello Markus,
At my knowledge nobody has worked on it in 2016. I should definitely take some time to clean thing up but if somebody else is willing to work on it it's even better.
Thomas
Le 20 déc. 2016 à 15:29, Markus Kroetzsch markus.kroetzsch@tu-dresden.de a écrit :
Hi,
now that SQID supports the confirmation/rejection of statements from Primary Sources (Freebase imports), I notice certain systematic issues with it. I believe many of the proposals should be removed because they are already represented in Wikidata and do not need to be imported.
Three types of data I found so far:
(1) Redundant "located in the administrative territorial entity"/"contains administrative territorial entity". Wikidata stores only the next territory above/below the current one in these relations. PS often suggests territories reachable through several steps instead.
Examples:
- https://tools.wmflabs.org/sqid/#/view?id=Q980 (login first to see suggestions). There are almost 100 towns that fall into this area suggested here, but they all should be organised in more specific sub-regions of the hierarchy.
- https://tools.wmflabs.org/sqid/#/view?id=Q10474 There is a higher-level territory suggested here (Bavaria) even though "Lower Bavaria" is already present.
Similar things are found, e.g., for occupation (P106), where a person that is already a "sport cyclist" might be suggested to be a "sportsperson".
(2) Syntactic variations of the "same" value. Typical cases are URLs, which PS suggests with trailing "/" even after top-level domains, while Wikidata often omits it. This means you have suggestions like "http://www.pirna.de/" when there is already "http://www.pirna.de".
(https://tools.wmflabs.org/sqid/#/view?id=Q6477)
(3) Redirect items as values. PS sometimes suggests statement values that are redirects to other entities, for which there already is a statement.
All of these cases should be fixed on the provider side, not by hiding suggestions in the UI (as it seems to be done by the PS gadget for case (2)). This would also help to get better statistics: right now, all I can do is to reject all of these values, but this might be misleading if one looks at the PS statistics since they are not wrong, but simply unnecessary.
Simply hiding suggestions that are not eliminated from the data also makes the PS service's feature for finding items with suggestions much less useful (you might find items that does not show you any suggestion).
I was wondering if anybody is still working on PS clean up now or if this part of the project this orphaned.
Cheers,
Markus
-- Prof. Dr. Markus Kroetzsch Knowledge-Based Systems Group Center for Advancing Electronics Dresden (cfaed) Faculty of Computer Science TU Dresden +49 351 463 38486 https://iccl.inf.tu-dresden.de/web/KBS/en
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hoi, Please consider, it has been said all too often that Primary Sources is the tool that should be used. Given that it has a bad UI and is not maintained; what benefits does it hold?
Why do we throw away all the good work when we do not value it? Thanks, GerardM
On 20 December 2016 at 15:29, Markus Kroetzsch < markus.kroetzsch@tu-dresden.de> wrote:
Hi,
now that SQID supports the confirmation/rejection of statements from Primary Sources (Freebase imports), I notice certain systematic issues with it. I believe many of the proposals should be removed because they are already represented in Wikidata and do not need to be imported.
Three types of data I found so far:
(1) Redundant "located in the administrative territorial entity"/"contains administrative territorial entity". Wikidata stores only the next territory above/below the current one in these relations. PS often suggests territories reachable through several steps instead.
Examples:
- https://tools.wmflabs.org/sqid/#/view?id=Q980 (login first to see
suggestions). There are almost 100 towns that fall into this area suggested here, but they all should be organised in more specific sub-regions of the hierarchy.
- https://tools.wmflabs.org/sqid/#/view?id=Q10474 There is a higher-level
territory suggested here (Bavaria) even though "Lower Bavaria" is already present.
Similar things are found, e.g., for occupation (P106), where a person that is already a "sport cyclist" might be suggested to be a "sportsperson".
(2) Syntactic variations of the "same" value. Typical cases are URLs, which PS suggests with trailing "/" even after top-level domains, while Wikidata often omits it. This means you have suggestions like " http://www.pirna.de/" when there is already "http://www.pirna.de".
(https://tools.wmflabs.org/sqid/#/view?id=Q6477)
(3) Redirect items as values. PS sometimes suggests statement values that are redirects to other entities, for which there already is a statement.
All of these cases should be fixed on the provider side, not by hiding suggestions in the UI (as it seems to be done by the PS gadget for case (2)). This would also help to get better statistics: right now, all I can do is to reject all of these values, but this might be misleading if one looks at the PS statistics since they are not wrong, but simply unnecessary.
Simply hiding suggestions that are not eliminated from the data also makes the PS service's feature for finding items with suggestions much less useful (you might find items that does not show you any suggestion).
I was wondering if anybody is still working on PS clean up now or if this part of the project this orphaned.
Cheers,
Markus
-- Prof. Dr. Markus Kroetzsch Knowledge-Based Systems Group Center for Advancing Electronics Dresden (cfaed) Faculty of Computer Science TU Dresden +49 351 463 38486 https://iccl.inf.tu-dresden.de/web/KBS/en
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On Tue, Dec 20, 2016 at 7:40 PM, Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi, Please consider, it has been said all too often that Primary Sources is the tool that should be used. Given that it has a bad UI and is not maintained; what benefits does it hold?
Why do we throw away all the good work when we do not value it?
https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Val... will hopefully be approved in the next days to give Marco the time to give the primary sources tool some much needed love.
Cheers Lydia
I was about to mention the StrepHit renewal proposal. Thanks Lydia for doing that faster than me! :-) Best,
Marco
On 12/20/16 19:56, Lydia Pintscher wrote:
On Tue, Dec 20, 2016 at 7:40 PM, Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi, Please consider, it has been said all too often that Primary Sources is the tool that should be used. Given that it has a bad UI and is not maintained; what benefits does it hold?
Why do we throw away all the good work when we do not value it?
https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Val... will hopefully be approved in the next days to give Marco the time to give the primary sources tool some much needed love.
Cheers Lydia