I think that's a fine threshold, but it will probably vary some per the type of data. Our goal is ultimately that every claim will have a reference, and then we can sort of pass the burden of accuracy on to the references. Wikipedia has grown well with that principle. Adding references will be much easier when we batch import data from sources dedicated to a particular type of data as opposed to parsing information out of Wikipedia (the IMDb has more consistency checks than Infobox film).

Date: Tue, 2 Apr 2013 10:39:22 -0400
From: tfmorris@gmail.com
To: wikidata-l@lists.wikimedia.org
Subject: Re: [Wikidata-l] Running "Infobox film" import script

On Tue, Apr 2, 2013 at 12:58 AM, Michael Hale <hale.michael.jr@live.com> wrote:

It will definitely have some errors, but I scanned the results for the first 100 movies before I started importing them, and I think the value-add will be much greater than the number of errors.

Does Wikidata have a quality goal or error rate threshold? For example, Freebase has a nominal quality goal of 99% accuracy and this is the metric that new data loads are judged against (they also want to be in the 95% confidence interval, which determines how big a sample you need when doing evaluations).

I haven't looked at this bot, but a develop/test/deploy cycle measured in hours seems, on the surface, to be very aggressive.

Tom

_______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l