Hello Markus,
At my knowledge nobody has worked on it in 2016. I should definitely take some time to
clean thing up but if somebody else is willing to work on it it's even better.
Thomas
Le 20 déc. 2016 à 15:29, Markus Kroetzsch
<markus.kroetzsch(a)tu-dresden.de> a écrit :
Hi,
now that SQID supports the confirmation/rejection of statements from Primary Sources
(Freebase imports), I notice certain systematic issues with it. I believe many of the
proposals should be removed because they are already represented in Wikidata and do not
need to be imported.
Three types of data I found so far:
(1) Redundant "located in the administrative territorial entity"/"contains
administrative territorial entity". Wikidata stores only the next territory
above/below the current one in these relations. PS often suggests territories reachable
through several steps instead.
Examples:
-
https://tools.wmflabs.org/sqid/#/view?id=Q980 (login first to see suggestions). There
are almost 100 towns that fall into this area suggested here, but they all should be
organised in more specific sub-regions of the hierarchy.
-
https://tools.wmflabs.org/sqid/#/view?id=Q10474 There is a higher-level territory
suggested here (Bavaria) even though "Lower Bavaria" is already present.
Similar things are found, e.g., for occupation (P106), where a person that is already a
"sport cyclist" might be suggested to be a "sportsperson".
(2) Syntactic variations of the "same" value. Typical cases are URLs, which PS
suggests with trailing "/" even after top-level domains, while Wikidata often
omits it. This means you have suggestions like "http://www.pirna.de/" when there
is already "http://www.pirna.de".
(
https://tools.wmflabs.org/sqid/#/view?id=Q6477)
(3) Redirect items as values. PS sometimes suggests statement values that are redirects
to other entities, for which there already is a statement.
All of these cases should be fixed on the provider side, not by hiding suggestions in the
UI (as it seems to be done by the PS gadget for case (2)). This would also help to get
better statistics: right now, all I can do is to reject all of these values, but this
might be misleading if one looks at the PS statistics since they are not wrong, but simply
unnecessary.
Simply hiding suggestions that are not eliminated from the data also makes the PS
service's feature for finding items with suggestions much less useful (you might find
items that does not show you any suggestion).
I was wondering if anybody is still working on PS clean up now or if this part of the
project this orphaned.
Cheers,
Markus
--
Prof. Dr. Markus Kroetzsch
Knowledge-Based Systems Group
Center for Advancing Electronics Dresden (cfaed)
Faculty of Computer Science
TU Dresden
+49 351 463 38486
https://iccl.inf.tu-dresden.de/web/KBS/en
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata