I know Freebase used oDesk. Note the number in question is 3000
judgements per person per day I've run tasks on Mechanical Turk and also I
make my own judgement sets for various things and I'd agree with that rate;
that comes to 9.6 seconds per judgement which I can believe. If you are
that fast you can make a living of it and never have to get out of your
pyjamas, but as a manager you have to do something about people who do
huge amounts of fast but barely acceptable work.
Note that they had $57M of funding
https://www.crunchbase.com/organization/metawebtechnologies
and if the fully loaded cost of those FTE equivalents was $50,000 via
oDesk, it would cost $5 M to get 100 million facts processed. So
practically they could have got a lot done. Metaweb and oDesk had
interlocking directorates
https://www.crunchbase.com/organization/metawebtechnologies/insights/curren…
so they probably had a great relationship with oDesk, which would have
helped.
Dealing with "turks" I would estimate that I'd ask each question somewhere
between 2 and 3 times on the average to catch most of the errors and
ambiguous cases and also get an estimate of how many I didn't catch.
On Thu, Jul 16, 2015 at 8:23 PM, Eric Sun <esun(a)cs.stanford.edu> wrote:
> Note that
Freebase did a lot of human curation and we know they could
get
about 3000 verifications of facts by "non-experts" a day who were paid for
their efforts. That scales out to almost a million facts per FTE per year.
Where can I found out more about how they were able to do such high-volume
human curation? 3000/day is a huge number.
On Thu, Jul 16, 2015 at 5:01 AM, <wikidata-request(a)lists.wikimedia.org>
wrote:
> Date: Wed, 15 Jul 2015 15:25:27 -0400
> From: Paul Houle <ontology2(a)gmail.com>
> To: "Discussion list for the Wikidata project."
> <wikidata-l(a)lists.wikimedia.org>
> Subject: [Wikidata] Freebase is dead, long live :BaseKB
> Message-ID:
> <
> CAE__kdQt55E7k7xHMeuBCu9QrwRKoMU_60NDuYgcTHNkC7DFHA(a)mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> For those who are interested in the project of getting something out of
> Freebase for use in Wikidata or somewhere else, I'd like to point out
>
>
http://basekb.com/gold/
>
> this a completely workable solution for running queries out of Freebase
> after the MQL API goes dark.
>
> I have been watching the discussion about the trouble moving Freebase data
> to Wikidata and let me share some thoughts.
>
> First quality is in the eye of the beholder and if somebody defines that
> quality is a matter of citing your sources, than that is their definition
> of 'quality' and they can attain it. You might have some other definition
> of quality and be appalled that Wikidata has so little to say about a
> topic
> that has caused much controversy and suffering:
>
>
https://www.wikidata.org/wiki/Q284451
>
> there are ways to attain that too.
>
> Part of the answer is that different products are going to be used in
> different places. For instance, one person might need 100% coverage of
> books he wants to talk about, another one might want a really great
> database of ski areas, etc.
>
> Note that Freebase did a lot of human curation and we know they could get
> about 3000 verifications of facts by "non-experts" a day who were paid
> for
> their efforts. That scales out to almost a million facts per FTE per
> year.
>
>
>
> --
> Paul Houle
>
> *Applying Schemas for Natural Language Processing, Distributed Systems,
> Classification and Text Mining and Data Lakes*
>
> (607) 539 6254 paul.houle on Skype ontology2(a)gmail.com
>
https://legalentityidentifier.info/lei/lookup/
> <http://legalentityidentifier.info/lei/lookup/>
>