On Mon, Apr 16, 2012 at 10:54 AM, Fabian M. Suchanek
<f.m.suchanek(a)gmail.com> wrote:
From: JFC
Morfin <jefsey(a)jefsey.com>
Thank you for this detailed explanation.
How do you see the integration/impact of Wikidata on both projects?
My intuition is that the impact could be mutual:
* for YAGO and DBpedia, the impact would be immediate, because
Wikidata could essentially provide cleaner infobox data for these
projects. Yet, we have to see how Wikidata will position itself to
Freebase, which seems to pursue a similar goal:
http://en.wikipedia.org/wiki/Freebase
(If you have thoughts on distinguishing Wikidata from Freebase, we'd
be happy to know)
I don't speak for Freebase, but I view Freebase, DBpedia, and YAGO as
all occupying comparable positions relative to Wikipedia/Wikidata.
They currently attempt to reverse engineer structured data out of
Wikipedia infoboxes and if Wikidata is successful in providing the
data source for the Wikipedia infoboxes, it'll eliminate a lot of
troublesome, error-prone parsing code.
Some of the ways that Freebase is different include:
- it's editable by anyone, so you don't need to go back to Wikipedia
to correct mistakes.
- it doesn't have a notability requirement like Wikipedia. If it's
factual and non-spammy, you can include it.
- infobox mappings aren't public and can only be modified by Google employees
- a relatively small number of popular infoboxes are mined (nowhere
near DBpedia's coverage)
- the refresh cycle is every couple of weeks (ie much faster than
DBpedia but much slower than DBpedia live)
- it includes a large amount of non-Wikipedia data from MusicBrainz,
OpenLibrary, Geonames, etc, as well as being linked to a number of
other sources of strong identifiers such as the New York Times, IMDB,
NNDB, U.S. Library of Congress Name Authority File and Subject
Headings, etc.
As far as positioning between Wikidata and Freebase goes, there's
really no way that Freebase (or any other non-Wikimedia Foundation
effort) could ever compete with Wikidata in the context of providing
data to Wikipedia. The Wikipedia culture is just too insular.
Instead I would expect Freebase to stop parsing infoboxes and consume
data directly from Wikidata in the same way that I would expect
DBpedia, YAGO and other consumers to.
Before that happens though, Wikidata not only needs to get the
technical infrastructure in place, but also change the culture of
Wikipedia editors so that they're not anti-data and care about the
semantics as well as the presentation of the information. A lot of
today's quality problems are social, not technical.
- YAGO, e.g., has mappings of infobox data to
relations with domains
and ranges, with a quality guarantee.
Guarantee? My understanding of the previous post was that a very
small sample of YAGO data had been measured for precision (with good
results), not that there was 100% curation or any type of quality
"guarantee."
Freebase has a stated 99% quality goal, but actual quality (as well as
coverage) varies greatly from domain to domain.
Tom