On 07.06.2015 18:29, Magnus Manske wrote:
One question remaining is: Should there be a difference between "human-verified" and "bot-verified"? A bot can check if e.g. the label (or the words in the label) occur on the page at the URL to check, but it can't know for sure. Human review is more reliable, but vastly slower and not likely to happen for many/most such statements. Two different properties could act as different confidence levels. But maybe I'm just over-engineering this ;-)
It depends. For structured data sources, a bot should be able to do a thorough verification (possibly better than a human), e.g., by comparing name, birthdate and deathdate of a person at once. I would focus on these cases first since we have enough of them ;-)
For cases where a bot con only make a guess, it might be better to add a human to the loop, as in your (truly amazing!) sourcerer game. The game also shows that it may depend on the items how well this approach works, since text matches are sometimes completely meaningless (e.g., "Human parent taxon homo" can not be verified by looking for "Homo" since every page that might contain this fact also mentions "Homo sapiens" many times). For such difficult cases, I am not sure if a bot-defined information "looked correct, but I am not sure" would really be very helpful. It depends ;-)
Cheers,
Markus
On Sun, Jun 7, 2015 at 4:19 PM Markus Krötzsch <markus@semantic-mediawiki.org mailto:markus@semantic-mediawiki.org> wrote:
Coming back to Magnus's suggestion ... I think the existing property "retrieved" (P813) could be used for this "last verified on" property, that is, for setting the time a which some external reference was last compared to a claim in Wikidata. Magnus also pointed out that many external IDs are "self-verifying" in that they are their own reference. The situation is somewhat similar for homepages. Should we adopt the practice of giving a single retrieved value (without any further information) as the reference for such cases? Adding P813 dates more widely would also open up new ways of maintaining data, since one would have a way to filter statements by how long ago they had last been checked. Best wishes, Markus On 03.06.2015 15:56, Markus Krötzsch wrote: > On 03.06.2015 13:57, Magnus Manske wrote: >> Maybe there is a case to separate import and verification here? >> >> There are many statements in Wikidata nowadays, but they get really >> "trustworthy" through references (other than "imported from Wikipedia"). >> But for external IDs, references are superfluous; they are their own >> reference, by definition. So how about marking IDs with a "verified" (or >> "last verified on") qualifier? Much of such work could be done by bots; >> we could then filter the problematic ones out for manual verification. >> >> As we have no control over external lists, this would have to be >> re-checked ever so often; but, again bots to the rescue. >> > > Yes, I fully support this proposal. > > What do you think about making "last verified on" not a qualifier but > (part of) the reference information? The reference could state where the > bot has looked up the ID and give a time. This would be somewhat similar > to what is now used in Freebase Ids, e.g., in > https://www.wikidata.org/wiki/Q42. > > In general, it might be useful to have such a "last verified on" > property that can be added to arbitrary references. There are many other > uses for this. One common case would be that a user has changed the > value without even being aware of the reference -- then one would be > able to detect this automatically by comparing the last modification > time with the "last verified on" date. > > Putting the "last verified on" into the references also makes it > possible to have different dates for different references there. > > Regards, > > Markus > > > > > _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata