Hi,
now that SQID supports the confirmation/rejection of statements from
Primary Sources (Freebase imports), I notice certain systematic issues
with it. I believe many of the proposals should be removed because they
are already represented in Wikidata and do not need to be imported.
Three types of data I found so far:
(1) Redundant "located in the administrative territorial
entity"/"contains administrative territorial entity". Wikidata stores
only the next territory above/below the current one in these relations.
PS often suggests territories reachable through several steps instead.
Examples:
-
https://tools.wmflabs.org/sqid/#/view?id=Q980 (login first to see
suggestions). There are almost 100 towns that fall into this area
suggested here, but they all should be organised in more specific
sub-regions of the hierarchy.
-
https://tools.wmflabs.org/sqid/#/view?id=Q10474 There is a
higher-level territory suggested here (Bavaria) even though "Lower
Bavaria" is already present.
Similar things are found, e.g., for occupation (P106), where a person
that is already a "sport cyclist" might be suggested to be a
"sportsperson".
(2) Syntactic variations of the "same" value. Typical cases are URLs,
which PS suggests with trailing "/" even after top-level domains, while
Wikidata often omits it. This means you have suggestions like
"http://www.pirna.de/" when there is already "http://www.pirna.de".
(
https://tools.wmflabs.org/sqid/#/view?id=Q6477)
(3) Redirect items as values. PS sometimes suggests statement values
that are redirects to other entities, for which there already is a
statement.
All of these cases should be fixed on the provider side, not by hiding
suggestions in the UI (as it seems to be done by the PS gadget for case
(2)). This would also help to get better statistics: right now, all I
can do is to reject all of these values, but this might be misleading if
one looks at the PS statistics since they are not wrong, but simply
unnecessary.
Simply hiding suggestions that are not eliminated from the data also
makes the PS service's feature for finding items with suggestions much
less useful (you might find items that does not show you any suggestion).
I was wondering if anybody is still working on PS clean up now or if
this part of the project this orphaned.
Cheers,
Markus
--
Prof. Dr. Markus Kroetzsch
Knowledge-Based Systems Group
Center for Advancing Electronics Dresden (cfaed)
Faculty of Computer Science
TU Dresden
+49 351 463 38486
https://iccl.inf.tu-dresden.de/web/KBS/en