Hi!
> I see that 19.6k statements have been approved through the tool, and
> 5.1k statements have been rejected - which means that about 1 in 5
> statements is deemed unsuitable by the users of primary sources.
From my (limited) experience with Primary Sources, there are several
kinds of things there that I had rejected:
- Unsourced statements that contradict what is written in Wikidata
- Duplicate claims already existing in Wikidata
- Duplicate claims with worse data (i.e. less accurate location, less
specific categorization, etc) or unnecessary qualifiers (such as adding
information which is already contained in the item to item's qualifiers
- e.g. zip code for a building)
- Source references that do not exist (404, etc.)
- Source references that do exist but either duplicate existing one (a
number of sources just refer to different URL of the same data) or do
not contain the information they should (e.g. link to newspaper's
homepage instead of specific article)
- Claims that are almost obviously invalid (e.g. "United Kingdom" as a
genre of a play)
I think at least some of these - esp. references that do not exist and
duplicates with no refs - could be removed automatically, thus raising
the relative quality of the remaining items.
OTOH, some of the entries can be made self-evident - i.e. if we talk
about movie and Freebase has IMDB ID or Netflix ID, it may be quite easy
to check if that ID is valid and refers to a movie by the same name,
which should be enough to merge it.
Not sure if those one-off things worth bothering with, just putting it
out there to consider.
--
Stas Malyshev
smalyshev@wikimedia.org
_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata