Shawn Simister, 14/06/2013 23:18:
There are several ways that this would benefit both
Google and Wikidata.
First, we currently extract a lot of data from WP infoboxes and load
that data into Freebase which eventually makes its way into the
Knowledge Graph so linking the two datasets would make it easier for us
to extract similar data from WikiData in the future. Many other tech
companies and researchers are doing similar extraction projects from WP
and Freebase so this would benefit them as well.
Secondly, we'd like to contribute (or enable the Wikidata community to
contribute) the data that we've already extracted from WP infoboxes back
to WikiData. I'm not quite sure what the best way to do that is (pull by
the community is probably better than push by Google) but having the
linkages between equivalent concepts is an important first step to
sharing more data.
I think this makes a lot sense. I also see no problem in Google or other
entities which already extract data from Wikimedia projects directly
operating bots pushing to Wikidata. It's actually something very good,
provided that
1) community processes are followed to define and create properties,
2) other policies are followed, of course,
3) to allow inspection on 1-2 and more, the logic (and if possible code)
of the extraction and injection are transparently described on the wiki.
Lastly, the Freebase community does a lot of work to clean up data that
was imported from WP, OpenLibrary, MusicBrainz, etc. including merging
duplicate topics and splitting apart conflated topics. This is important
but tedious work that often doesn't get pushed back to the original
sources for no other reason than there simply isn't a well-defined
process for how that should work. I'm hopeful that the WikiData
community will find a way to benefit from the cleanup that we do in
Freebase creating a virtuous cycle that improves the quality of both
datasets.
Yes, this is a case where collaboration should be particularly easy and
productive. For a closer integration with MusicBrainz, they'd probably
have to change their (non)licenses.
<https://en.wikipedia.org/wiki/MusicBrainz#Licensing> (They seem to
claim they're free content, but they aren't.)
Nemo