One question remaining is: Should there be a
difference between
"human-verified" and "bot-verified"? A bot can check if e.g. the
label
(or the words in the label) occur on the page at the URL to check, but
it can't know for sure. Human review is more reliable, but vastly slower
and not likely to happen for many/most such statements. Two different
properties could act as different confidence levels. But maybe I'm just
over-engineering this ;-)
It depends. For structured data sources, a bot should be able to do a
thorough verification (possibly better than a human), e.g., by comparing
name, birthdate and deathdate of a person at once. I would focus on
these cases first since we have enough of them ;-)
For cases where a bot con only make a guess, it might be better to add a
human to the loop, as in your (truly amazing!) sourcerer game. The game
also shows that it may depend on the items how well this approach works,
since text matches are sometimes completely meaningless (e.g., "Human
parent taxon homo" can not be verified by looking for "Homo" since every
page that might contain this fact also mentions "Homo sapiens" many
times). For such difficult cases, I am not sure if a bot-defined
information "looked correct, but I am not sure" would really be very
helpful. It depends ;-)
Cheers,
Markus
On Sun, Jun 7, 2015 at 4:19 PM Markus Krötzsch
<markus(a)semantic-mediawiki.org <mailto:markus@semantic-mediawiki.org>>
wrote:
Coming back to Magnus's suggestion ... I think the existing property
"retrieved" (P813) could be used for this "last verified on"
property,
that is, for setting the time a which some external reference was last
compared to a claim in Wikidata.
Magnus also pointed out that many external IDs are "self-verifying" in
that they are their own reference. The situation is somewhat similar for
homepages. Should we adopt the practice of giving a single retrieved
value (without any further information) as the reference for such cases?
Adding P813 dates more widely would also open up new ways of maintaining
data, since one would have a way to filter statements by how long ago
they had last been checked.
Best wishes,
Markus
On 03.06.2015 15:56, Markus Krötzsch wrote:
On 03.06.2015 13:57, Magnus Manske wrote:
> Maybe there is a case to separate import and verification here?
>
> There are many statements in Wikidata nowadays, but they get really
> "trustworthy" through references (other than "imported from
Wikipedia").
> But for external IDs, references are
superfluous; they are their own
> reference, by definition. So how about marking IDs with a
"verified" (or
> "last verified on") qualifier? Much
of such work could be done
by bots;
> we could then filter the problematic ones out
for manual
verification.
As we have no control over external lists, this would have to be
re-checked ever so often; but, again bots to the rescue.
Yes, I fully support this proposal.
What do you think about making "last verified on" not a qualifier but
(part of) the reference information? The reference could state
where the
bot has looked up the ID and give a time. This
would be somewhat
similar
to what is now used in Freebase Ids, e.g., in
https://www.wikidata.org/wiki/Q42.
In general, it might be useful to have such a "last verified on"
property that can be added to arbitrary references. There are
many other
uses for this. One common case would be that a
user has changed the
value without even being aware of the reference -- then one would be
able to detect this automatically by comparing the last modification
time with the "last verified on" date.
Putting the "last verified on" into the references also makes it
possible to have different dates for different references there.
Regards,
Markus
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata