Hi Sebastian,
I'll try to take on some of your doubts, hopefully helping you to solve them, or at least to give you some starting points.
Il giorno ven 20 set 2019 alle ore 10:48 Sebastian Hellmann hellmann@informatik.uni-leipzig.de ha scritto:
- there was a Knowledge Engine Project which failed, but in principle had the right idea: https://en.wikipedia.org/wiki/Knowledge_Engine_(Wikimedia_Foundation)
This was aimed to "democratize the discovery of media, news and information", in particular counter-moving the traffic sink by Google providing Wikipedia's information in Google Search.
I don't remember/know much about the Knowledge Engine (KE), but to quote Liam Wyatt/User:Wittylama, "the crime wasn't thinking about it, it was the cover-up".
In other words, and based on what I remember and know, the Wikipedia internal search engine always sucked, and KE was an hypothesis of solving this problem. The main problems were: 1) an overall sensation - I repeat: SENSATION - that WMF was ready to compete with Google on the "search engine market", something that was never discussed within and/or with the community; 2) that this project was pushed in a very "secretive" way, i.e. it was discovered by chance with an announcement of WMF winning a grant from [I don't remember which institution, sorry], and the more questions were raised about it, the less answers the then-Executive Director seemed to be willing to give.
IMHO, having an internal engine that helps people getting what they're looking for is a great idea, and the way it was conducted was indeed a crime, because (again IMHO) we lost a good opportunity to start our work several years in advance. What makes me still angry about it was the way the whole thing was conducted: we still lack most pieces of the whole thing, and this may fuel non-NPOV reconstructions as well as unnecessary spin-off discussions that bring us further away from the solution we were trying to achieve.
Now that there is Wikidata, this is much better for Google because they can take the CC-0 data as they wish.
KE and Wikidata are two separate issues. I'm sure Wikidata would have played a role in KE, given its important role in linking concepts and items, but they're still two separate things.
As for Google picking data from Wikidata, they do the same from countless databases (disregarding of their license), so all I can say is that, if I were Google, I'd do the very same thing. The difference between Google and Wikidata, and the reason why I still think Wikidata is better, is that the latter releases its data to *everybody*, while the former keeps it only to itself.
And I want to stress that "everybody" part: when we do synchronisation with a GLAM database, we give them back an extremely valuable feedback, in terms of link to other databases they can freely access, as well as in terms of hints for data clean-up - which, again, is something that Google doesn't provide at all.
- I was under the impression that Google bought Freebase and then started Wikidata as a non-threatening model to the data they have in their Knowledge Graph
Could someone give me some pointers about the financial connections of Google and Wikimedia (this should be transparent, right?) and also who pushed the Wikidata movement into life in 2012?
Wikidata started as an independent project by some of the people who worked on Semantic MediaWiki (there are so many of them I fear I might miss some of them, and that would be embarrassing for me), not as a Google project.
It was originally financed *also* by Google, yes, but it was a small part compared to the aid from other institutions, such as the Allen Institute for Artificial Intelligence, the Gordon and Betty Moore Foundation, the Wikimedia Foundation itself, and others.
Google was also mentioned in https://blog.wikimedia.org/2017/10/30/wikidata-fifth-birthday/ but while it reads "Freebase, was discontinued because of the superiority of Wikidata’s approach and active community." I know the story as: Google didn't want its competitors to have the data and the service. Not much of Freebase did end up in Wikidata.
I remember the story as "Google couldn't make anymore money out of Freebase, that was being also superseded by other internal systems *and* Wikidata, so Denny pushed Google to donate Freebase's triples to Wikidata".
This is basically the same (well, with due proportions) that happened with OpenRefine, which originally was called Google Refine and that was discontinued because Google couldn't do any profit with it, and now is one of the most valuable tools that we can use to clean up and re-conciliate data with Wikidata.
As for the integration of the data, I don't have any precise data about it, but I'm sure that a fair part of Freebase did end up in Wikidata, just as much as many other big databases did.
As I said, I don't want to push any opinions in any directions. I am more asking for more information about the connection of Google to Wikidata (financially), then Google to WMF and also I am asking about any strategic advantages for Google in relation to their competition.
I cannot properly answer you about this. WMF and Google are in my view "frenemies": Google is, and will always be, a Big Tech company and WMF is, and will always be, a champion of free knowledge. You just can't do free knowledge by forcing Big Tech companies to NOT pick up your tools and data, though, as much I as think it'd be unnecessary for us just to NOT take any help from Google, if we can work together on several objectives. This is ok to me, as much as we keep being transparent on this - which I recognise to be your point and your motivation beneath your email, so don't worry about it. ;)
I hope I helped you in wrapping your head about the whole thing. :)
Cheers,