Re: [Wikidata] Freebase is dead, long live :BaseKB

17 Jul 2015

They wrote a really insightful paper about how their processes for
large-scale data curation worked.  Among may other things, they
investigated mechanical turk 'micro tasks' versus hourly workers and
generally found the latter to be more cost effective.

"The Anatomy of a Large-Scale Human Computation Engine"
http://wiki.freebase.com/images/e/e0/Hcomp10-anatomy.pdf

(I have the PDF in case you want it after that link expires..)

-Ben

p.s. As side note I tend to agree with the camps on this list that think it
would be an enormous waste if the work that went into the content in
freebase was not leveraged effectively for wikidata.  Its not easy to raise
millions of dollars for data curation..

On Fri, Jul 17, 2015 at 7:56 AM, Paul Houle &lt;ontology2(a)gmail.com&gt; wrote:

...
  I know Freebase used oDesk.  Note the number in
question is  3000
 judgements per person per day I've run tasks on Mechanical Turk and also I
 make my own judgement sets for various things and I'd agree with that rate;
  that comes to 9.6 seconds per judgement which I can believe.  If you are
 that fast you can make a living of it and never have to get out of your
 pyjamas,  but as a manager you have to do something about people who do
 huge amounts of fast but barely acceptable work.

   Note that they had $57M of funding

 https://www.crunchbase.com/organization/metawebtechnologies

 and if the fully loaded cost of those FTE equivalents was $50,000 via
 oDesk,  it would cost $5 M to get 100 million facts processed.  So
 practically they could have got a lot done.  Metaweb and oDesk had
 interlocking directorates

https://www.crunchbase.com/organization/metawebtechnologies/insights/curren…

 so they probably had a great relationship with oDesk,  which would have
 helped.

 Dealing with "turks" I would estimate that I'd ask each question somewhere
 between 2 and 3 times on the average to catch most of the errors and
 ambiguous cases and also get an estimate of how many I didn't catch.

 On Thu, Jul 16, 2015 at 8:23 PM, Eric Sun &lt;esun(a)cs.stanford.edu&gt; wrote:

> >> Note that Freebase did a lot of human curation and we know they could
> get
> about 3000 verifications  of facts by "non-experts" a day who were paid
> for
> their efforts.  That scales out to almost a million facts per FTE per
> year.
>
>
> Where can I found out more about how they were able to do such
> high-volume human curation?  3000/day is a huge number.
>
>
>
> On Thu, Jul 16, 2015 at 5:01 AM, &lt;wikidata-request(a)lists.wikimedia.org&gt;
> wrote:
>
>> Date: Wed, 15 Jul 2015 15:25:27 -0400
>> From: Paul Houle &lt;ontology2(a)gmail.com&gt;
>> To: "Discussion list for the Wikidata project."
>>         &lt;wikidata-l(a)lists.wikimedia.org&gt;
>> Subject: [Wikidata] Freebase is dead, long live :BaseKB
>> Message-ID:
>>         <
>> CAE__kdQt55E7k7xHMeuBCu9QrwRKoMU_60NDuYgcTHNkC7DFHA(a)mail.gmail.com&gt;
>> Content-Type: text/plain; charset="utf-8"
>>
>> For those who are interested in the project of getting something out of
>> Freebase for use in Wikidata or somewhere else,  I'd like to point out
>>
>> http://basekb.com/gold/
>>
>> this a completely workable solution for  running queries out of Freebase
>> after the MQL API goes dark.
>>
>> I have been watching the discussion about the trouble moving Freebase
>> data
>> to Wikidata and let me share some thoughts.
>>
>> First quality is in the eye of the beholder and if somebody defines that
>> quality is a matter of citing your sources,  than that is their
>> definition
>> of 'quality' and they can attain it.  You might have some other
>> definition
>> of quality and be appalled that Wikidata has so little to say about a
>> topic
>> that has caused much controversy and suffering:
>>
>> https://www.wikidata.org/wiki/Q284451
>>
>> there are ways to attain that too.
>>
>> Part of the answer is that different products are going to be used in
>> different places.  For instance,  one person might need 100% coverage of
>> books he wants to talk about,  another one might want a really great
>> database of ski areas,  etc.
>>
>> Note that Freebase did a lot of human curation and we know they could get
>> about 3000 verifications  of facts by "non-experts" a day who were
paid
>> for
>> their efforts.  That scales out to almost a million facts per FTE per
>> year.
>>
>>
>>
>> --
>> Paul Houle
>>
>> *Applying Schemas for Natural Language Processing, Distributed Systems,
>> Classification and Text Mining and Data Lakes*
>>
>> (607) 539 6254    paul.houle on Skype   ontology2(a)gmail.com
>> https://legalentityidentifier.info/lei/lookup/
>> <http://legalentityidentifier.info/lei/lookup/>
>>  

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikidata] Freebase is dead, long live :BaseKB