Re: [Wikidata] Freebase is dead, long live :BaseKB

17 Jul 2015

3,000 judgments per person per day sounds high to me, particularly on a
sustained basis, but it really depends on the type of task.  Some of the
tasks were very simple with custom high performance single purpose "games"
designed around them.  For example, Genderizer presented a person's
information and allowed choices of Male, Female, Other, and Skip.  Using
arrow key bindings for the four choices to allow quick selection without
moving one's hand, pipelining preloading the next topic in the background,
and allowing votes to be undone in case of error were all features which
allowed voters to make choices very quickly.

The figures quoted in the paper below (18 seconds per judgment) work out to
more like 1,800 judgments per eight hour day.  They collected 2.3 million
judgments over the course of a year from 555 volunteers (1.05 million
judgments) and 84 paid workers (1.25 million).

On Fri, Jul 17, 2015 at 12:35 PM, Benjamin Good &lt;ben.mcgee.good(a)gmail.com&gt;
wrote:

...
  They wrote a really insightful paper about how their
processes for
 large-scale data curation worked.  Among may other things, they
 investigated mechanical turk 'micro tasks' versus hourly workers and
 generally found the latter to be more cost effective.

 "The Anatomy of a Large-Scale Human Computation Engine"
 http://wiki.freebase.com/images/e/e0/Hcomp10-anatomy.pdf

The full citation, in case someone needs to track it down, is:

Kochhar, Shailesh, Stefano Mazzocchi, and Praveen Paritosh. "The anatomy of
a large-scale human computation engine." *Proceedings of the acm sigkdd
workshop on human computation*. ACM, 2010.

There's also a slide presentation by the same name which presents some
additional information:
http://www.slideshare.net/brixofglory/rabj-freebase-all-5049845

Praveen Paritosh has written a number of papers on the topic of human
computation, if you're interested in that (I am!):
https://scholar.google.com/citations?user=_wX4sFYAAAAJ&hl=en&oi=sra

p.s. As side note I tend to agree with the camps on this list that think it
...
  would be an enormous waste if the work that went into
the content in
 freebase was not leveraged effectively for wikidata.  Its not easy to raise
 millions of dollars for data curation..

It's been a disappointing process, but not entirely unexpected.  Wikidata's
biggest potential strength is the army of Wikipedians, but they are also
its biggest potential liability (cf. notability et al).

Tom

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikidata] Freebase is dead, long live :BaseKB