3,000 judgments per person per day sounds high to me, particularly on a
sustained basis, but it really depends on the type of task. Some of the
tasks were very simple with custom high performance single purpose "games"
designed around them. For example, Genderizer presented a person's
information and allowed choices of Male, Female, Other, and Skip. Using
arrow key bindings for the four choices to allow quick selection without
moving one's hand, pipelining preloading the next topic in the background,
and allowing votes to be undone in case of error were all features which
allowed voters to make choices very quickly.
The figures quoted in the paper below (18 seconds per judgment) work out to
more like 1,800 judgments per eight hour day. They collected 2.3 million
judgments over the course of a year from 555 volunteers (1.05 million
judgments) and 84 paid workers (1.25 million).
On Fri, Jul 17, 2015 at 12:35 PM, Benjamin Good <ben.mcgee.good(a)gmail.com>
wrote:
They wrote a really insightful paper about how their
processes for
large-scale data curation worked. Among may other things, they
investigated mechanical turk 'micro tasks' versus hourly workers and
generally found the latter to be more cost effective.
"The Anatomy of a Large-Scale Human Computation Engine"
http://wiki.freebase.com/images/e/e0/Hcomp10-anatomy.pdf
The full citation, in case someone needs to track it down, is:
Kochhar, Shailesh, Stefano Mazzocchi, and Praveen Paritosh. "The anatomy of
a large-scale human computation engine." *Proceedings of the acm sigkdd
workshop on human computation*. ACM, 2010.
There's also a slide presentation by the same name which presents some
additional information:
http://www.slideshare.net/brixofglory/rabj-freebase-all-5049845
Praveen Paritosh has written a number of papers on the topic of human
computation, if you're interested in that (I am!):
https://scholar.google.com/citations?user=_wX4sFYAAAAJ&hl=en&oi=sra
p.s. As side note I tend to agree with the camps on this list that think it
would be an enormous waste if the work that went into
the content in
freebase was not leveraged effectively for wikidata. Its not easy to raise
millions of dollars for data curation..
It's been a disappointing process, but not entirely unexpected. Wikidata's
biggest potential strength is the army of Wikipedians, but they are also
its biggest potential liability (cf. notability et al).
Tom