On Mon, Apr 16, 2012 at 10:54 AM, Fabian M. Suchanek f.m.suchanek@gmail.com wrote:
From: JFC Morfin jefsey@jefsey.com Thank you for this detailed explanation. How do you see the integration/impact of Wikidata on both projects?
My intuition is that the impact could be mutual:
- for YAGO and DBpedia, the impact would be immediate, because
Wikidata could essentially provide cleaner infobox data for these projects. Yet, we have to see how Wikidata will position itself to Freebase, which seems to pursue a similar goal: http://en.wikipedia.org/wiki/Freebase (If you have thoughts on distinguishing Wikidata from Freebase, we'd be happy to know)
I don't speak for Freebase, but I view Freebase, DBpedia, and YAGO as all occupying comparable positions relative to Wikipedia/Wikidata. They currently attempt to reverse engineer structured data out of Wikipedia infoboxes and if Wikidata is successful in providing the data source for the Wikipedia infoboxes, it'll eliminate a lot of troublesome, error-prone parsing code.
Some of the ways that Freebase is different include:
- it's editable by anyone, so you don't need to go back to Wikipedia to correct mistakes. - it doesn't have a notability requirement like Wikipedia. If it's factual and non-spammy, you can include it. - infobox mappings aren't public and can only be modified by Google employees - a relatively small number of popular infoboxes are mined (nowhere near DBpedia's coverage) - the refresh cycle is every couple of weeks (ie much faster than DBpedia but much slower than DBpedia live) - it includes a large amount of non-Wikipedia data from MusicBrainz, OpenLibrary, Geonames, etc, as well as being linked to a number of other sources of strong identifiers such as the New York Times, IMDB, NNDB, U.S. Library of Congress Name Authority File and Subject Headings, etc.
As far as positioning between Wikidata and Freebase goes, there's really no way that Freebase (or any other non-Wikimedia Foundation effort) could ever compete with Wikidata in the context of providing data to Wikipedia. The Wikipedia culture is just too insular. Instead I would expect Freebase to stop parsing infoboxes and consume data directly from Wikidata in the same way that I would expect DBpedia, YAGO and other consumers to.
Before that happens though, Wikidata not only needs to get the technical infrastructure in place, but also change the culture of Wikipedia editors so that they're not anti-data and care about the semantics as well as the presentation of the information. A lot of today's quality problems are social, not technical.
- YAGO, e.g., has mappings of infobox data to relations with domains
and ranges, with a quality guarantee.
Guarantee? My understanding of the previous post was that a very small sample of YAGO data had been measured for precision (with good results), not that there was 100% curation or any type of quality "guarantee."
Freebase has a stated 99% quality goal, but actual quality (as well as coverage) varies greatly from domain to domain.
Tom