Google's stake in Wikidata and Wikipedia

List overview All Threads
Download

newer

older

Wikimedia Research Showcase -...

(no subject)

Sebastian Hellmann

20 Sep 2019 20 Sep '19

1:52 p.m.

Dear all,

personally I am quite happy that Denny can contribute more to Wikidata and Wikipedia. No personal criticism there, I read his thesis and I am impressed by his work and contributions.

I don't want to facilitate any conspiracy theories here, but I am wondering about where Wikidata is going, especially with respect to Google.

Note that Chrome/Chromium being Open Source with a twist has already pushed Firefox from the market, but now there is this controversy about what is being tracked server side by Google Analytics and Client side by cookies and also the current discussion about Ad Blocker removal from Chrome: https://www.wired.com/story/google-chrome-ad-blockers-extensions-api/

Maybe somebody could enlighten me about the overall strategy and connections here.

1. there was a Knowledge Engine Project which failed, but in principle had the right idea: https://en.wikipedia.org/wiki/Knowledge_Engine_(Wikimedia_Foundation)

This was aimed to "democratize the discovery of media, news and information", in particular counter-moving the traffic sink by Google providing Wikipedia's information in Google Search. Now that there is Wikidata, this is much better for Google because they can take the CC-0 data as they wish.

2. there are some very widely used terms like "Knowledge Graph" , which seems to be blocked by Google: https://www.wikidata.org/wiki/Q648625 and https://en.wikipedia.org/wiki/Knowledge_Graph without a neutral point of view like the German WP adopted: https://de.wikipedia.org/wiki/Google#Knowledge_Graph

3. I was under the impression that Google bought Freebase and then started Wikidata as a non-threatening model to the data they have in their Knowledge Graph

Could someone give me some pointers about the financial connections of Google and Wikimedia (this should be transparent, right?) and also who pushed the Wikidata movement into life in 2012?

Google was also mentioned in https://blog.wikimedia.org/2017/10/30/wikidata-fifth-birthday/ but while it reads "Freebase https://en.wikipedia.org/wiki/Freebase, was discontinued because of the superiority of Wikidata’s approach and active community." I know the story as: Google didn't want its competitors to have the data and the service. Not much of Freebase did end up in Wikidata.

As I said, I don't want to push any opinions in any directions. I am more asking for more information about the connection of Google to Wikidata (financially), then Google to WMF and also I am asking about any strategic advantages for Google in relation to their competition.

Please don't answer with "How great Wikidata is", I already know that and this is also not in the scope of my "How intertwined is Google with Wikidata / WMF?" question. Can't mention this enough: also not against Denny.

It is a request for better information as I can't seem to find clear answers here.

-- All the best, Sebastian Hellmann Director of Knowledge Integration and Linked Data Technologies (KILT) Competence Center at the Institute for Applied Informatics (InfAI) at Leipzig University Executive Director of the DBpedia Association Projects: http://dbpedia.org, http://nlp2rdf.org, http://linguistics.okfn.org, https://www.w3.org/community/ld4lt http://www.w3.org/community/ld4lt Homepage: http://aksw.org/SebastianHellmann Research Group: http://aksw.org

Attachments:

attachment.htm (text/html — 5.8 KB)

Show replies by date

Nicolas VIGNERON

20 Sep 20 Sep

2:38 p.m.

Hi,

You can already found some information here: https://en.wikipedia.org/wiki/Wikidata#Development_history (including finance details is you follow the sources).

For the "How intertwined is Google", it's a long and complex story, it goes back at least to 2005 (Wikipedia probably wouldn't exist today - or in a drastic different way - if the search engine didn't favour Wikipedia since then). As a non-answer, I would say that Wikidata is as intertwined with Google as any major website is intertwined with Google.

Cdlt, ~nicolas

Le ven. 20 sept. 2019 à 10:48, Sebastian Hellmann < hellmann@informatik.uni-leipzig.de> a écrit :

...

Dear all,

personally I am quite happy that Denny can contribute more to Wikidata and Wikipedia. No personal criticism there, I read his thesis and I am impressed by his work and contributions.

I don't want to facilitate any conspiracy theories here, but I am wondering about where Wikidata is going, especially with respect to Google.

Note that Chrome/Chromium being Open Source with a twist has already pushed Firefox from the market, but now there is this controversy about what is being tracked server side by Google Analytics and Client side by cookies and also the current discussion about Ad Blocker removal from Chrome: https://www.wired.com/story/google-chrome-ad-blockers-extensions-api/

Maybe somebody could enlighten me about the overall strategy and connections here.

there was a Knowledge Engine Project which failed, but in principle had

the right idea: https://en.wikipedia.org/wiki/Knowledge_Engine_(Wikimedia_Foundation)

This was aimed to "democratize the discovery of media, news and information", in particular counter-moving the traffic sink by Google providing Wikipedia's information in Google Search. Now that there is Wikidata, this is much better for Google because they can take the CC-0 data as they wish.

there are some very widely used terms like "Knowledge Graph" , which

seems to be blocked by Google: https://www.wikidata.org/wiki/Q648625 and https://en.wikipedia.org/wiki/Knowledge_Graph without a neutral point of view like the German WP adopted: https://de.wikipedia.org/wiki/Google#Knowledge_Graph

I was under the impression that Google bought Freebase and then started

Wikidata as a non-threatening model to the data they have in their Knowledge Graph

Could someone give me some pointers about the financial connections of Google and Wikimedia (this should be transparent, right?) and also who pushed the Wikidata movement into life in 2012?

Google was also mentioned in https://blog.wikimedia.org/2017/10/30/wikidata-fifth-birthday/ but while it reads "Freebase https://en.wikipedia.org/wiki/Freebase, was discontinued because of the superiority of Wikidata’s approach and active community." I know the story as: Google didn't want its competitors to have the data and the service. Not much of Freebase did end up in Wikidata.

As I said, I don't want to push any opinions in any directions. I am more asking for more information about the connection of Google to Wikidata (financially), then Google to WMF and also I am asking about any strategic advantages for Google in relation to their competition.

Please don't answer with "How great Wikidata is", I already know that and this is also not in the scope of my "How intertwined is Google with Wikidata / WMF?" question. Can't mention this enough: also not against Denny. It is a request for better information as I can't seem to find clear answers here.

-- All the best, Sebastian Hellmann

Director of Knowledge Integration and Linked Data Technologies (KILT) Competence Center at the Institute for Applied Informatics (InfAI) at Leipzig University Executive Director of the DBpedia Association Projects: http://dbpedia.org, http://nlp2rdf.org, http://linguistics.okfn.org, https://www.w3.org/community/ld4lt http://www.w3.org/community/ld4lt Homepage: http://aksw.org/SebastianHellmann Research Group: http://aksw.org _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Federico Leva (Nemo)

3:13 p.m.

Sebastian Hellmann, 20/09/19 11:22:

...

Maybe somebody could enlighten me about the overall strategy and connections here.

You can add more links to grants and other Wikimedia pages on https://meta.wikimedia.org/wiki/Google.

Google and the Wikimedia movement are on opposite sides for most things, but occasionally some of their employees (or algorithms!) happen to be interested in the same things as us, so we end up doing things together and a few breadcrumbs travel towards WMF. What matters to me is that they don't abuse our brands.

Sadly WMF is not always careful about communication, for instance https://wikimediafoundation.org/our-work/ still has an appalling sentence "Working with partners like Google" right under the heading "Partner for change".

Federico

Luca Martinelli

3:54 p.m.

Hi Sebastian,

I'll try to take on some of your doubts, hopefully helping you to solve them, or at least to give you some starting points.

Il giorno ven 20 set 2019 alle ore 10:48 Sebastian Hellmann hellmann@informatik.uni-leipzig.de ha scritto:

...

there was a Knowledge Engine Project which failed, but in principle had the right idea: https://en.wikipedia.org/wiki/Knowledge_Engine_(Wikimedia_Foundation)

This was aimed to "democratize the discovery of media, news and information", in particular counter-moving the traffic sink by Google providing Wikipedia's information in Google Search.

I don't remember/know much about the Knowledge Engine (KE), but to quote Liam Wyatt/User:Wittylama, "the crime wasn't thinking about it, it was the cover-up".

In other words, and based on what I remember and know, the Wikipedia internal search engine always sucked, and KE was an hypothesis of solving this problem. The main problems were: 1) an overall sensation - I repeat: SENSATION - that WMF was ready to compete with Google on the "search engine market", something that was never discussed within and/or with the community; 2) that this project was pushed in a very "secretive" way, i.e. it was discovered by chance with an announcement of WMF winning a grant from [I don't remember which institution, sorry], and the more questions were raised about it, the less answers the then-Executive Director seemed to be willing to give.

IMHO, having an internal engine that helps people getting what they're looking for is a great idea, and the way it was conducted was indeed a crime, because (again IMHO) we lost a good opportunity to start our work several years in advance. What makes me still angry about it was the way the whole thing was conducted: we still lack most pieces of the whole thing, and this may fuel non-NPOV reconstructions as well as unnecessary spin-off discussions that bring us further away from the solution we were trying to achieve.

...

Now that there is Wikidata, this is much better for Google because they can take the CC-0 data as they wish.

KE and Wikidata are two separate issues. I'm sure Wikidata would have played a role in KE, given its important role in linking concepts and items, but they're still two separate things.

As for Google picking data from Wikidata, they do the same from countless databases (disregarding of their license), so all I can say is that, if I were Google, I'd do the very same thing. The difference between Google and Wikidata, and the reason why I still think Wikidata is better, is that the latter releases its data to *everybody*, while the former keeps it only to itself.

And I want to stress that "everybody" part: when we do synchronisation with a GLAM database, we give them back an extremely valuable feedback, in terms of link to other databases they can freely access, as well as in terms of hints for data clean-up - which, again, is something that Google doesn't provide at all.

...

I was under the impression that Google bought Freebase and then started Wikidata as a non-threatening model to the data they have in their Knowledge Graph

Could someone give me some pointers about the financial connections of Google and Wikimedia (this should be transparent, right?) and also who pushed the Wikidata movement into life in 2012?

Wikidata started as an independent project by some of the people who worked on Semantic MediaWiki (there are so many of them I fear I might miss some of them, and that would be embarrassing for me), not as a Google project.

It was originally financed *also* by Google, yes, but it was a small part compared to the aid from other institutions, such as the Allen Institute for Artificial Intelligence, the Gordon and Betty Moore Foundation, the Wikimedia Foundation itself, and others.

...

Google was also mentioned in https://blog.wikimedia.org/2017/10/30/wikidata-fifth-birthday/ but while it reads "Freebase, was discontinued because of the superiority of Wikidata’s approach and active community." I know the story as: Google didn't want its competitors to have the data and the service. Not much of Freebase did end up in Wikidata.

I remember the story as "Google couldn't make anymore money out of Freebase, that was being also superseded by other internal systems *and* Wikidata, so Denny pushed Google to donate Freebase's triples to Wikidata".

This is basically the same (well, with due proportions) that happened with OpenRefine, which originally was called Google Refine and that was discontinued because Google couldn't do any profit with it, and now is one of the most valuable tools that we can use to clean up and re-conciliate data with Wikidata.

As for the integration of the data, I don't have any precise data about it, but I'm sure that a fair part of Freebase did end up in Wikidata, just as much as many other big databases did.

...

As I said, I don't want to push any opinions in any directions. I am more asking for more information about the connection of Google to Wikidata (financially), then Google to WMF and also I am asking about any strategic advantages for Google in relation to their competition.

I cannot properly answer you about this. WMF and Google are in my view "frenemies": Google is, and will always be, a Big Tech company and WMF is, and will always be, a champion of free knowledge. You just can't do free knowledge by forcing Big Tech companies to NOT pick up your tools and data, though, as much I as think it'd be unnecessary for us just to NOT take any help from Google, if we can work together on several objectives. This is ok to me, as much as we keep being transparent on this - which I recognise to be your point and your motivation beneath your email, so don't worry about it. ;)

I hope I helped you in wrapping your head about the whole thing. :)

Cheers,

-- Luca "Sannita" Martinelli http://it.wikipedia.org/wiki/Utente:Sannita

Thad Guidry

6:58 p.m.

With my tech evangelist hat on...

Google's philanthropy is nearly boundless when it comes to the promotion of knowledge. Why? Because indeed it's in their best interest otherwise no one can prosper without knowledge. They aggregate knowledge for the benefit of mankind, and then make a profit through advertising ... all while making that knowledge extremely easy to be found for the world.

Nothing in this world is entirely free (servers must spin, cooling must be provided, bugs squashed...). To that end, Google and others understand this and help defray substantial costs of providing free knowledge in multiple domains especially in those domains that contribute to tech and human goodwill (science & medicine). Sometimes with direct cash donations to WMF, even just this year with $2 million being decided by Google employees to give to WMF!!! <https://techcrunch.com/2019/01/22/google-org-donates-2-million-to-wikipedias...

...

Other times it's with talent from interns they pay for during the summer, or tech knowledge exchanges to help tackle problems we have. Still other times it's just their 20% employee time helping the world keep Open Source libraries up to date or giving the world Open Source tools that we ourselves use across WMF every minute of the day. Then there are all the trickle down benefits (increasing privacy, REALLY?, yes Really! https://opensource.googleblog.com/2019/09/enabling-developers-and-organizations.html, reducing security risks, better performance, etc.) from those Open Source libraries & tools with things like ClusterFuzz https://opensource.googleblog.com/2019/02/open-sourcing-clusterfuzz.html, TensorFlow, Go, Kubernetes, and 1000's of others. https://opensource.google.com/ https://github.com/google

Thad https://www.linkedin.com/in/thadguidry/

On Fri, Sep 20, 2019 at 5:25 AM Luca Martinelli martinelliluca@gmail.com wrote:

...

Hi Sebastian,

I'll try to take on some of your doubts, hopefully helping you to solve them, or at least to give you some starting points.

Il giorno ven 20 set 2019 alle ore 10:48 Sebastian Hellmann hellmann@informatik.uni-leipzig.de ha scritto:

...

there was a Knowledge Engine Project which failed, but in principle

had the right idea: https://en.wikipedia.org/wiki/Knowledge_Engine_(Wikimedia_Foundation)

...
This was aimed to "democratize the discovery of media, news and

information", in particular counter-moving the traffic sink by Google providing Wikipedia's information in Google Search.

I don't remember/know much about the Knowledge Engine (KE), but to quote Liam Wyatt/User:Wittylama, "the crime wasn't thinking about it, it was the cover-up".

In other words, and based on what I remember and know, the Wikipedia internal search engine always sucked, and KE was an hypothesis of solving this problem. The main problems were:

an overall sensation - I repeat: SENSATION - that WMF was ready to

compete with Google on the "search engine market", something that was never discussed within and/or with the community; 2) that this project was pushed in a very "secretive" way, i.e. it was discovered by chance with an announcement of WMF winning a grant from [I don't remember which institution, sorry], and the more questions were raised about it, the less answers the then-Executive Director seemed to be willing to give.

IMHO, having an internal engine that helps people getting what they're looking for is a great idea, and the way it was conducted was indeed a crime, because (again IMHO) we lost a good opportunity to start our work several years in advance. What makes me still angry about it was the way the whole thing was conducted: we still lack most pieces of the whole thing, and this may fuel non-NPOV reconstructions as well as unnecessary spin-off discussions that bring us further away from the solution we were trying to achieve.

...
Now that there is Wikidata, this is much better for Google because they

can take the CC-0 data as they wish.

KE and Wikidata are two separate issues. I'm sure Wikidata would have played a role in KE, given its important role in linking concepts and items, but they're still two separate things.

As for Google picking data from Wikidata, they do the same from countless databases (disregarding of their license), so all I can say is that, if I were Google, I'd do the very same thing. The difference between Google and Wikidata, and the reason why I still think Wikidata is better, is that the latter releases its data to *everybody*, while the former keeps it only to itself.

And I want to stress that "everybody" part: when we do synchronisation with a GLAM database, we give them back an extremely valuable feedback, in terms of link to other databases they can freely access, as well as in terms of hints for data clean-up - which, again, is something that Google doesn't provide at all.

...

I was under the impression that Google bought Freebase and then

started Wikidata as a non-threatening model to the data they have in their Knowledge Graph

...
Could someone give me some pointers about the financial connections of

Google and Wikimedia (this should be transparent, right?) and also who pushed the Wikidata movement into life in 2012?

Wikidata started as an independent project by some of the people who worked on Semantic MediaWiki (there are so many of them I fear I might miss some of them, and that would be embarrassing for me), not as a Google project.

It was originally financed *also* by Google, yes, but it was a small part compared to the aid from other institutions, such as the Allen Institute for Artificial Intelligence, the Gordon and Betty Moore Foundation, the Wikimedia Foundation itself, and others.

...
Google was also mentioned in

https://blog.wikimedia.org/2017/10/30/wikidata-fifth-birthday/ but while it reads "Freebase, was discontinued because of the superiority of Wikidata’s approach and active community." I know the story as: Google didn't want its competitors to have the data and the service. Not much of Freebase did end up in Wikidata.

I remember the story as "Google couldn't make anymore money out of Freebase, that was being also superseded by other internal systems *and* Wikidata, so Denny pushed Google to donate Freebase's triples to Wikidata".

This is basically the same (well, with due proportions) that happened with OpenRefine, which originally was called Google Refine and that was discontinued because Google couldn't do any profit with it, and now is one of the most valuable tools that we can use to clean up and re-conciliate data with Wikidata.

As for the integration of the data, I don't have any precise data about it, but I'm sure that a fair part of Freebase did end up in Wikidata, just as much as many other big databases did.

...
As I said, I don't want to push any opinions in any directions. I am

more asking for more information about the connection of Google to Wikidata (financially), then Google to WMF and also I am asking about any strategic advantages for Google in relation to their competition.

I cannot properly answer you about this. WMF and Google are in my view "frenemies": Google is, and will always be, a Big Tech company and WMF is, and will always be, a champion of free knowledge. You just can't do free knowledge by forcing Big Tech companies to NOT pick up your tools and data, though, as much I as think it'd be unnecessary for us just to NOT take any help from Google, if we can work together on several objectives. This is ok to me, as much as we keep being transparent on this - which I recognise to be your point and your motivation beneath your email, so don't worry about it. ;)

I hope I helped you in wrapping your head about the whole thing. :)

Cheers,

-- Luca "Sannita" Martinelli http://it.wikipedia.org/wiki/Utente:Sannita

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Sebastian Hellmann

8:13 p.m.

Hi Thad,

On 20.09.19 15:28, Thad Guidry wrote:

...

With my tech evangelist hat on...

Google's philanthropy is nearly boundless when it comes to the promotion of knowledge. Why? Because indeed it's in their best interest otherwise no one can prosper without knowledge. They aggregate knowledge for the benefit of mankind, and then make a profit through advertising ... all while making that knowledge extremely easy to be found for the world.

I am neither pro-Google or anti-Google per se. Maybe skeptical and interested in what is the truth behind the truth. Google is not synonym to philanthropy. Wikimedia is or at least I think they are doing many things right. Google is a platform, so primarily they "aggregate knowledge for their benefit" while creating enough incentives in form of accessibility for users to add the user's knowledge to theirs. It is not about what Google offers, but what it takes in return. 20% of employees time is also an investment in the skill of the employee, a Google asset called Human Capital and also leads to me and Denny from Google discussing whether https://en.wikipedia.org/wiki/Talk:Knowledge_Graph is content marketing or knowledge (@Denny: no offense, legit arguments, but no agenda to resolve the stalled discussion there). Except I don't have 20% time to straighten the view into what I believe would be neutral, so pushing it becomes a resource issue.

I found the other replies much more realistic and the perspective is yet unclear. Maybe Mozilla wasn't so much frenemy with Google and got removed from the browser market for it. I am also thinking about Linked Open Data. Decentralisation is quite weak, individually. I guess spreading all the Wikibases around to super-nodes is helpful unless it prevents the formation of a stronger lobby of philanthropists or competition to BigTech. Wikidata created some pressure on DBpedia as well (also opportunities), but we are fine since we can simply innovate. Others might not withstand. Microsoft seems to favor OpenStreetMaps so I am just asking to which degree Open Source and Open Data is being instrumentalised by BigTech.

Hence my question, whether it is compromise or be removed. (Note that states are also platforms, which measure value in GDP and make laws and roads and take VAT on transactions. Sometimes, they even don't remove opposition.)

Thad Guidry

8:40 p.m.

Thank you for sharing your opinions, Sebastian.

Cheers, Thad https://www.linkedin.com/in/thadguidry/

On Fri, Sep 20, 2019 at 9:43 AM Sebastian Hellmann < hellmann@informatik.uni-leipzig.de> wrote:

...

Hi Thad, On 20.09.19 15:28, Thad Guidry wrote:

With my tech evangelist hat on...

Google's philanthropy is nearly boundless when it comes to the promotion of knowledge. Why? Because indeed it's in their best interest otherwise no one can prosper without knowledge. They aggregate knowledge for the benefit of mankind, and then make a profit through advertising ... all while making that knowledge extremely easy to be found for the world.

I am neither pro-Google or anti-Google per se. Maybe skeptical and interested in what is the truth behind the truth. Google is not synonym to philanthropy. Wikimedia is or at least I think they are doing many things right. Google is a platform, so primarily they "aggregate knowledge for their benefit" while creating enough incentives in form of accessibility for users to add the user's knowledge to theirs. It is not about what Google offers, but what it takes in return. 20% of employees time is also an investment in the skill of the employee, a Google asset called Human Capital and also leads to me and Denny from Google discussing whether https://en.wikipedia.org/wiki/Talk:Knowledge_Graph is content marketing or knowledge (@Denny: no offense, legit arguments, but no agenda to resolve the stalled discussion there). Except I don't have 20% time to straighten the view into what I believe would be neutral, so pushing it becomes a resource issue.

I found the other replies much more realistic and the perspective is yet unclear. Maybe Mozilla wasn't so much frenemy with Google and got removed from the browser market for it. I am also thinking about Linked Open Data. Decentralisation is quite weak, individually. I guess spreading all the Wikibases around to super-nodes is helpful unless it prevents the formation of a stronger lobby of philanthropists or competition to BigTech. Wikidata created some pressure on DBpedia as well (also opportunities), but we are fine since we can simply innovate. Others might not withstand. Microsoft seems to favor OpenStreetMaps so I am just asking to which degree Open Source and Open Data is being instrumentalised by BigTech.

Hence my question, whether it is compromise or be removed. (Note that states are also platforms, which measure value in GDP and make laws and roads and take VAT on transactions. Sometimes, they even don't remove opposition.)

-- All the best, Sebastian Hellmann

Director of Knowledge Integration and Linked Data Technologies (KILT) Competence Center at the Institute for Applied Informatics (InfAI) at Leipzig University Executive Director of the DBpedia Association Projects: http://dbpedia.org, http://nlp2rdf.org, http://linguistics.okfn.org, https://www.w3.org/community/ld4lt http://www.w3.org/community/ld4lt Homepage: http://aksw.org/SebastianHellmann Research Group: http://aksw.org

Denny Vrandečić

9:23 p.m.

Sebastian,

"I don't want to facilitate conspiracy theories, but ..." "[I am] interested in what is the truth behind the truth"

I am sorry, I truly am, but this *is* the language I know from conspiracy theorists. And given that, I cannot imagine that there is anything I can say that could convince you otherwise. Therefore there is no real point for me in engaging with this conversation on these terms, I cannot see how it would turn constructive.

The answers to many of your questions are public and on the record. Others tried to point you to them (thanks), but you dismiss them as not fitting your narrative.

So here's a suggestion, which I think might be much more constructive and forward-looking:

I have been working on a comparison of DBpedia, Wikidata, and Freebase (and since you've read my thesis, you know that's a thing I know a bit about). Simple evaluation, coverage, correctness, nothing dramatically fancy. But I am torn about publishing it, because, d'oh, people may (with good reasons) dismiss it as being biased. And truth be told - the simple fact that I don't know DBpedia as well as I know Wikidata and Freebase might indeed have lead to errors, mistakes, and stuff I missed in the evaluation. But you know what would help?

You.

My suggestion is that I publish my current draft, and then you and me work together on it, publically, in the open, until we reach a state we both consider correct enough for publication.

What do you think?

Cheers, Denny

P.S.: I am travelling the next week, so I may ask for patience

On Fri, Sep 20, 2019 at 8:11 AM Thad Guidry thadguidry@gmail.com wrote:

...

Thank you for sharing your opinions, Sebastian.

Cheers, Thad https://www.linkedin.com/in/thadguidry/

On Fri, Sep 20, 2019 at 9:43 AM Sebastian Hellmann < hellmann@informatik.uni-leipzig.de> wrote:

...
Hi Thad, On 20.09.19 15:28, Thad Guidry wrote:

With my tech evangelist hat on...

Google's philanthropy is nearly boundless when it comes to the promotion of knowledge. Why? Because indeed it's in their best interest otherwise no one can prosper without knowledge. They aggregate knowledge for the benefit of mankind, and then make a profit through advertising ... all while making that knowledge extremely easy to be found for the world.

I am neither pro-Google or anti-Google per se. Maybe skeptical and interested in what is the truth behind the truth. Google is not synonym to philanthropy. Wikimedia is or at least I think they are doing many things right. Google is a platform, so primarily they "aggregate knowledge for their benefit" while creating enough incentives in form of accessibility for users to add the user's knowledge to theirs. It is not about what Google offers, but what it takes in return. 20% of employees time is also an investment in the skill of the employee, a Google asset called Human Capital and also leads to me and Denny from Google discussing whether https://en.wikipedia.org/wiki/Talk:Knowledge_Graph is content marketing or knowledge (@Denny: no offense, legit arguments, but no agenda to resolve the stalled discussion there). Except I don't have 20% time to straighten the view into what I believe would be neutral, so pushing it becomes a resource issue.

I found the other replies much more realistic and the perspective is yet unclear. Maybe Mozilla wasn't so much frenemy with Google and got removed from the browser market for it. I am also thinking about Linked Open Data. Decentralisation is quite weak, individually. I guess spreading all the Wikibases around to super-nodes is helpful unless it prevents the formation of a stronger lobby of philanthropists or competition to BigTech. Wikidata created some pressure on DBpedia as well (also opportunities), but we are fine since we can simply innovate. Others might not withstand. Microsoft seems to favor OpenStreetMaps so I am just asking to which degree Open Source and Open Data is being instrumentalised by BigTech.

Hence my question, whether it is compromise or be removed. (Note that states are also platforms, which measure value in GDP and make laws and roads and take VAT on transactions. Sometimes, they even don't remove opposition.)

-- All the best, Sebastian Hellmann

Director of Knowledge Integration and Linked Data Technologies (KILT) Competence Center at the Institute for Applied Informatics (InfAI) at Leipzig University Executive Director of the DBpedia Association Projects: http://dbpedia.org, http://nlp2rdf.org, http://linguistics.okfn.org, https://www.w3.org/community/ld4lt http://www.w3.org/community/ld4lt Homepage: http://aksw.org/SebastianHellmann Research Group: http://aksw.org

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Sebastian Hellmann

10:41 p.m.

Na, I am quite open, albeit impulsive. The information given was quite good and some of my concerns regarding the involvement of Google were also lifted or relativized. Mainly due to the fact that there seems to be a sense of awareness.

I am just studying economic principles, which are very powerful. I also have the feeling that free and open stuff just got a lot more commercial and I am still struggling with myself whether this is good or not. Also whether DBpedia should become frenemies with BigTech. Or funny things like many funding agencies try to push for national sustainability options, but most of the time, they suggest to use the GitHub Platform. Wikibase could be an option here.

I have to apologize for the Knowledge Graph Talk thing. I was a bit grumpy, because I thought I wasted a lot of time on the Talk page that could have been invested in making the article better (WP:BE_BOLD style), but now I think, it might have been my own mistake. So apologies for lashing out there.

(see comments below)

On 20.09.19 17:53, Denny Vrandečić wrote:

...

Sebastian,

"I don't want to facilitate conspiracy theories, but ..." "[I am] interested in what is the truth behind the truth"

I am sorry, I truly am, but this *is* the language I know from conspiracy theorists. And given that, I cannot imagine that there is anything I can say that could convince you otherwise. Therefore there is no real point for me in engaging with this conversation on these terms, I cannot see how it would turn constructive.

The answers to many of your questions are public and on the record. Others tried to point you to them (thanks), but you dismiss them as not fitting your narrative.

So here's a suggestion, which I think might be much more constructive and forward-looking:

I have been working on a comparison of DBpedia, Wikidata, and Freebase (and since you've read my thesis, you know that's a thing I know a bit about). Simple evaluation, coverage, correctness, nothing dramatically fancy. But I am torn about publishing it, because, d'oh, people may (with good reasons) dismiss it as being biased. And truth be told - the simple fact that I don't know DBpedia as well as I know Wikidata and Freebase might indeed have lead to errors, mistakes, and stuff I missed in the evaluation. But you know what would help?

You.

My suggestion is that I publish my current draft, and then you and me work together on it, publically, in the open, until we reach a state we both consider correct enough for publication.

What do you think?

Sure, we are doing statistics at the moment as well. It is a bit hard to define what DBpedia is nowadays as we are rebranding the remixed datasets, now that we can pick up links and other data from the Databus. It might not even be a real dataset anymore, but glue between datasets focusing on the speed of integration and ease of quality improvement. Also still working on the concrete Sync Targets for GlobalFactSync (https://meta.wikimedia.org/wiki/Grants:Project/DBpedia/GlobalFactSyncRE) as well.

One question I have is whether Wikidata is effective/efficient or where it is effective and where it could use improvement as a chance for collaboration.

So yes any time.

-- Sebastian

...

Cheers, Denny

P.S.: I am travelling the next week, so I may ask for patience

On Fri, Sep 20, 2019 at 8:11 AM Thad Guidry <thadguidry@gmail.com mailto:thadguidry@gmail.com> wrote:

Thank you for sharing your opinions, Sebastian.

Cheers,
Thad
https://www.linkedin.com/in/thadguidry/


On Fri, Sep 20, 2019 at 9:43 AM Sebastian Hellmann
<hellmann@informatik.uni-leipzig.de
<mailto:hellmann@informatik.uni-leipzig.de>> wrote:

    Hi Thad,

    On 20.09.19 15:28, Thad Guidry wrote:

...

    With my tech evangelist hat on...

    Google's philanthropy is nearly boundless when it comes to
    the promotion of knowledge.  Why? Because indeed it's in
    their best interest otherwise no one can prosper without
    knowledge.  They aggregate knowledge for the benefit of
    mankind, and then make a profit through advertising ... all
    while making that knowledge extremely easy to be found for
    the world.

    I am neither pro-Google or anti-Google per se. Maybe skeptical
    and interested in what is the truth behind the truth. Google
    is not synonym to philanthropy. Wikimedia is or at least I
    think they are doing many things right. Google is a platform,
    so primarily they "aggregate knowledge for their benefit"
    while creating enough incentives in form of accessibility for
    users to add the user's knowledge to theirs. It is not about
    what Google offers, but what it takes in return. 20% of
    employees time is also an investment in the skill of the
    employee, a Google asset called Human Capital and also leads
    to me and Denny from Google discussing whether
    https://en.wikipedia.org/wiki/Talk:Knowledge_Graph is content
    marketing or knowledge (@Denny: no offense, legit arguments,
    but no agenda to resolve the stalled discussion there). Except
    I don't have 20% time to straighten the view into what I
    believe would be neutral, so pushing it becomes a resource issue.

    I found the other replies much more realistic and the
    perspective is yet unclear. Maybe Mozilla wasn't so much
    frenemy with Google and got removed from the browser market
    for it. I am also thinking about Linked Open Data.
    Decentralisation is quite weak, individually. I guess
    spreading all the Wikibases around to super-nodes is helpful
    unless it prevents the formation of a stronger lobby of
    philanthropists or competition to BigTech. Wikidata created
    some pressure on DBpedia as well (also opportunities), but we
    are fine since we can simply innovate. Others might not
    withstand. Microsoft seems to favor OpenStreetMaps so I am
    just asking to which degree Open Source and Open Data is being
    instrumentalised by BigTech.

    Hence my question, whether it is compromise or be removed.
    (Note that states are also platforms, which measure value in
    GDP and make laws and roads and take VAT on transactions.
    Sometimes, they even don't remove opposition.)

    -- 
    All the best,
    Sebastian Hellmann

    Director of Knowledge Integration and Linked Data Technologies
    (KILT) Competence Center
    at the Institute for Applied Informatics (InfAI) at Leipzig
    University
    Executive Director of the DBpedia Association
    Projects: http://dbpedia.org, http://nlp2rdf.org,
    http://linguistics.okfn.org,
    https://www.w3.org/community/ld4lt
    <http://www.w3.org/community/ld4lt>
    Homepage: http://aksw.org/SebastianHellmann
    Research Group: http://aksw.org

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata

Denny Vrandečić

11:01 p.m.

Yes, you're touching exactly on the problems I had during the evaluation - I couldn't even figure out what DBpedia is. Thanks, your help will be very much appreciated.

OK, I will send a link the week after the next, and then we can start working on it :) I am very much looking forward to it.

On Fri, Sep 20, 2019 at 10:11 AM Sebastian Hellmann < hellmann@informatik.uni-leipzig.de> wrote:

...

Na, I am quite open, albeit impulsive. The information given was quite good and some of my concerns regarding the involvement of Google were also lifted or relativized. Mainly due to the fact that there seems to be a sense of awareness.

I am just studying economic principles, which are very powerful. I also have the feeling that free and open stuff just got a lot more commercial and I am still struggling with myself whether this is good or not. Also whether DBpedia should become frenemies with BigTech. Or funny things like many funding agencies try to push for national sustainability options, but most of the time, they suggest to use the GitHub Platform. Wikibase could be an option here.

I have to apologize for the Knowledge Graph Talk thing. I was a bit grumpy, because I thought I wasted a lot of time on the Talk page that could have been invested in making the article better (WP:BE_BOLD style), but now I think, it might have been my own mistake. So apologies for lashing out there.

(see comments below) On 20.09.19 17:53, Denny Vrandečić wrote:

Sebastian,

"I don't want to facilitate conspiracy theories, but ..." "[I am] interested in what is the truth behind the truth"

I am sorry, I truly am, but this *is* the language I know from conspiracy theorists. And given that, I cannot imagine that there is anything I can say that could convince you otherwise. Therefore there is no real point for me in engaging with this conversation on these terms, I cannot see how it would turn constructive.

The answers to many of your questions are public and on the record. Others tried to point you to them (thanks), but you dismiss them as not fitting your narrative.

So here's a suggestion, which I think might be much more constructive and forward-looking:

I have been working on a comparison of DBpedia, Wikidata, and Freebase (and since you've read my thesis, you know that's a thing I know a bit about). Simple evaluation, coverage, correctness, nothing dramatically fancy. But I am torn about publishing it, because, d'oh, people may (with good reasons) dismiss it as being biased. And truth be told - the simple fact that I don't know DBpedia as well as I know Wikidata and Freebase might indeed have lead to errors, mistakes, and stuff I missed in the evaluation. But you know what would help?

You.

My suggestion is that I publish my current draft, and then you and me work together on it, publically, in the open, until we reach a state we both consider correct enough for publication.

What do you think?

Sure, we are doing statistics at the moment as well. It is a bit hard to define what DBpedia is nowadays as we are rebranding the remixed datasets, now that we can pick up links and other data from the Databus. It might not even be a real dataset anymore, but glue between datasets focusing on the speed of integration and ease of quality improvement. Also still working on the concrete Sync Targets for GlobalFactSync ( https://meta.wikimedia.org/wiki/Grants:Project/DBpedia/GlobalFactSyncRE) as well.

One question I have is whether Wikidata is effective/efficient or where it is effective and where it could use improvement as a chance for collaboration.

So yes any time.

-- Sebastian

Cheers, Denny

P.S.: I am travelling the next week, so I may ask for patience

On Fri, Sep 20, 2019 at 8:11 AM Thad Guidry thadguidry@gmail.com wrote:

...
Thank you for sharing your opinions, Sebastian.

Cheers, Thad https://www.linkedin.com/in/thadguidry/

On Fri, Sep 20, 2019 at 9:43 AM Sebastian Hellmann < hellmann@informatik.uni-leipzig.de> wrote:

...
Hi Thad, On 20.09.19 15:28, Thad Guidry wrote:

With my tech evangelist hat on...

Google's philanthropy is nearly boundless when it comes to the promotion of knowledge. Why? Because indeed it's in their best interest otherwise no one can prosper without knowledge. They aggregate knowledge for the benefit of mankind, and then make a profit through advertising ... all while making that knowledge extremely easy to be found for the world.

I am neither pro-Google or anti-Google per se. Maybe skeptical and interested in what is the truth behind the truth. Google is not synonym to philanthropy. Wikimedia is or at least I think they are doing many things right. Google is a platform, so primarily they "aggregate knowledge for their benefit" while creating enough incentives in form of accessibility for users to add the user's knowledge to theirs. It is not about what Google offers, but what it takes in return. 20% of employees time is also an investment in the skill of the employee, a Google asset called Human Capital and also leads to me and Denny from Google discussing whether https://en.wikipedia.org/wiki/Talk:Knowledge_Graph is content marketing or knowledge (@Denny: no offense, legit arguments, but no agenda to resolve the stalled discussion there). Except I don't have 20% time to straighten the view into what I believe would be neutral, so pushing it becomes a resource issue.

I found the other replies much more realistic and the perspective is yet unclear. Maybe Mozilla wasn't so much frenemy with Google and got removed from the browser market for it. I am also thinking about Linked Open Data. Decentralisation is quite weak, individually. I guess spreading all the Wikibases around to super-nodes is helpful unless it prevents the formation of a stronger lobby of philanthropists or competition to BigTech. Wikidata created some pressure on DBpedia as well (also opportunities), but we are fine since we can simply innovate. Others might not withstand. Microsoft seems to favor OpenStreetMaps so I am just asking to which degree Open Source and Open Data is being instrumentalised by BigTech.

Hence my question, whether it is compromise or be removed. (Note that states are also platforms, which measure value in GDP and make laws and roads and take VAT on transactions. Sometimes, they even don't remove opposition.)

-- All the best, Sebastian Hellmann

Director of Knowledge Integration and Linked Data Technologies (KILT) Competence Center at the Institute for Applied Informatics (InfAI) at Leipzig University Executive Director of the DBpedia Association Projects: http://dbpedia.org, http://nlp2rdf.org, http://linguistics.okfn.org, https://www.w3.org/community/ld4lt http://www.w3.org/community/ld4lt Homepage: http://aksw.org/SebastianHellmann Research Group: http://aksw.org

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

-- All the best, Sebastian Hellmann

Director of Knowledge Integration and Linked Data Technologies (KILT) Competence Center at the Institute for Applied Informatics (InfAI) at Leipzig University Executive Director of the DBpedia Association Projects: http://dbpedia.org, http://nlp2rdf.org, http://linguistics.okfn.org, https://www.w3.org/community/ld4lt http://www.w3.org/community/ld4lt Homepage: http://aksw.org/SebastianHellmann Research Group: http://aksw.org

hellmann＠informatik.uni-leipzig.de

11:30 p.m.

Just an ominous note here. It has to do with th property of the semantic web of only having one schema and several id's for same things and then it is just a matter of how to partition it again and distribute it to where people need the information and establishing feedback in the opposite direction. Basically an implemented variation of what Kingsley has been saying for years.

Waiting for your message.

LG, Sebastian

On September 20, 2019 7:31:36 PM GMT+02:00, "Denny Vrandečić" vrandecic@gmail.com wrote:

...

Yes, you're touching exactly on the problems I had during the evaluation - I couldn't even figure out what DBpedia is. Thanks, your help will be very much appreciated.

OK, I will send a link the week after the next, and then we can start working on it :) I am very much looking forward to it.

On Fri, Sep 20, 2019 at 10:11 AM Sebastian Hellmann < hellmann@informatik.uni-leipzig.de> wrote:

...
Na, I am quite open, albeit impulsive. The information given was

quite

...
good and some of my concerns regarding the involvement of Google were

also

...
lifted or relativized. Mainly due to the fact that there seems to be

a

...
sense of awareness.

I am just studying economic principles, which are very powerful. I

also

...
have the feeling that free and open stuff just got a lot more

commercial

...
and I am still struggling with myself whether this is good or not.

Also

...
whether DBpedia should become frenemies with BigTech. Or funny things

like

...
many funding agencies try to push for national sustainability

options, but

...
most of the time, they suggest to use the GitHub Platform. Wikibase

could

...
be an option here.

I have to apologize for the Knowledge Graph Talk thing. I was a bit grumpy, because I thought I wasted a lot of time on the Talk page

that

...
could have been invested in making the article better (WP:BE_BOLD

style),

...
but now I think, it might have been my own mistake. So apologies for lashing out there.

(see comments below) On 20.09.19 17:53, Denny Vrandečić wrote:

Sebastian,

"I don't want to facilitate conspiracy theories, but ..." "[I am] interested in what is the truth behind the truth"

I am sorry, I truly am, but this *is* the language I know from

conspiracy

...
theorists. And given that, I cannot imagine that there is anything I

can

...
say that could convince you otherwise. Therefore there is no real

point for

...
me in engaging with this conversation on these terms, I cannot see

how it

...
would turn constructive.

The answers to many of your questions are public and on the record.

Others

...
tried to point you to them (thanks), but you dismiss them as not

fitting

...
your narrative.

So here's a suggestion, which I think might be much more constructive

and

...
forward-looking:

I have been working on a comparison of DBpedia, Wikidata, and

Freebase

...
(and since you've read my thesis, you know that's a thing I know a

bit

...
about). Simple evaluation, coverage, correctness, nothing

dramatically

...
fancy. But I am torn about publishing it, because, d'oh, people may

(with

...
good reasons) dismiss it as being biased. And truth be told - the

simple

...
fact that I don't know DBpedia as well as I know Wikidata and

Freebase

...
might indeed have lead to errors, mistakes, and stuff I missed in the evaluation. But you know what would help?

You.

My suggestion is that I publish my current draft, and then you and me

work

...
together on it, publically, in the open, until we reach a state we

both

...
consider correct enough for publication.

What do you think?

Sure, we are doing statistics at the moment as well. It is a bit hard

to

...
define what DBpedia is nowadays as we are rebranding the remixed

datasets,

...
now that we can pick up links and other data from the Databus. It

might not

...
even be a real dataset anymore, but glue between datasets focusing on

the

...
speed of integration and ease of quality improvement. Also still

working on

...
the concrete Sync Targets for GlobalFactSync (

https://meta.wikimedia.org/wiki/Grants:Project/DBpedia/GlobalFactSyncRE)

...
as well.

One question I have is whether Wikidata is effective/efficient or

where it

...
is effective and where it could use improvement as a chance for collaboration.

So yes any time.

-- Sebastian

Cheers, Denny

P.S.: I am travelling the next week, so I may ask for patience

On Fri, Sep 20, 2019 at 8:11 AM Thad Guidry thadguidry@gmail.com

wrote:

...
...
Thank you for sharing your opinions, Sebastian.

Cheers, Thad https://www.linkedin.com/in/thadguidry/

On Fri, Sep 20, 2019 at 9:43 AM Sebastian Hellmann < hellmann@informatik.uni-leipzig.de> wrote:

...
Hi Thad, On 20.09.19 15:28, Thad Guidry wrote:

With my tech evangelist hat on...

Google's philanthropy is nearly boundless when it comes to the

promotion

...
...
...
of knowledge. Why? Because indeed it's in their best interest

otherwise no

...
...
...
one can prosper without knowledge. They aggregate knowledge for

the

...
...
...
benefit of mankind, and then make a profit through advertising ...

all

...
...
...
while making that knowledge extremely easy to be found for the

world.

...
...
...
I am neither pro-Google or anti-Google per se. Maybe skeptical and interested in what is the truth behind the truth. Google is not

synonym to

...
...
...
philanthropy. Wikimedia is or at least I think they are doing many

things

...
...
...
right. Google is a platform, so primarily they "aggregate knowledge

for

...
...
...
their benefit" while creating enough incentives in form of

accessibility

...
...
...
for users to add the user's knowledge to theirs. It is not about

what

...
...
...
Google offers, but what it takes in return. 20% of employees time

is also

...
...
...
an investment in the skill of the employee, a Google asset called

Human

...
...
...
Capital and also leads to me and Denny from Google discussing

whether

...
...
...
https://en.wikipedia.org/wiki/Talk:Knowledge_Graph is content

marketing

...
...
...
or knowledge (@Denny: no offense, legit arguments, but no agenda to

resolve

...
...
...
the stalled discussion there). Except I don't have 20% time to

straighten

...
...
...
the view into what I believe would be neutral, so pushing it

becomes a

...
...
...
resource issue.

I found the other replies much more realistic and the perspective

is yet

...
...
...
unclear. Maybe Mozilla wasn't so much frenemy with Google and got

removed

...
...
...
from the browser market for it. I am also thinking about Linked

Open Data.

...
...
...
Decentralisation is quite weak, individually. I guess spreading all

the

...
...
...
Wikibases around to super-nodes is helpful unless it prevents the

formation

...
...
...
of a stronger lobby of philanthropists or competition to BigTech.

Wikidata

...
...
...
created some pressure on DBpedia as well (also opportunities), but

we are

...
...
...
fine since we can simply innovate. Others might not withstand.

Microsoft

...
...
...
seems to favor OpenStreetMaps so I am just asking to which degree

Open

...
...
...
Source and Open Data is being instrumentalised by BigTech.

Hence my question, whether it is compromise or be removed. (Note

that

...
...
...
states are also platforms, which measure value in GDP and make laws

and

...
...
...
roads and take VAT on transactions. Sometimes, they even don't

remove

...
...
...
opposition.)

-- All the best, Sebastian Hellmann

Director of Knowledge Integration and Linked Data Technologies

(KILT)

...
...
...
Competence Center at the Institute for Applied Informatics (InfAI) at Leipzig

University

...
...
...
Executive Director of the DBpedia Association Projects: http://dbpedia.org, http://nlp2rdf.org, http://linguistics.okfn.org, https://www.w3.org/community/ld4lt http://www.w3.org/community/ld4lt Homepage: http://aksw.org/SebastianHellmann Research Group: http://aksw.org

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

-- All the best, Sebastian Hellmann

Director of Knowledge Integration and Linked Data Technologies (KILT) Competence Center at the Institute for Applied Informatics (InfAI) at Leipzig

University

...
Executive Director of the DBpedia Association Projects: http://dbpedia.org, http://nlp2rdf.org, http://linguistics.okfn.org, https://www.w3.org/community/ld4lt http://www.w3.org/community/ld4lt Homepage: http://aksw.org/SebastianHellmann Research Group: http://aksw.org

-- Sent from my Android device with K-9 Mail. Please excuse my brevity.

Samuel Klein

11:34 p.m.

I'm also interested in this comparison and intersection, and glad to share perspective + help. Warmly, SJ

On Fri, Sep 20, 2019 at 1:32 PM Denny Vrandečić vrandecic@gmail.com wrote:

...

Yes, you're touching exactly on the problems I had during the evaluation - I couldn't even figure out what DBpedia is. Thanks, your help will be very much appreciated.

OK, I will send a link the week after the next, and then we can start working on it :) I am very much looking forward to it.

On Fri, Sep 20, 2019 at 10:11 AM Sebastian Hellmann < hellmann@informatik.uni-leipzig.de> wrote:

...
Na, I am quite open, albeit impulsive. The information given was quite good and some of my concerns regarding the involvement of Google were also lifted or relativized. Mainly due to the fact that there seems to be a sense of awareness.

I am just studying economic principles, which are very powerful. I also have the feeling that free and open stuff just got a lot more commercial and I am still struggling with myself whether this is good or not. Also whether DBpedia should become frenemies with BigTech. Or funny things like many funding agencies try to push for national sustainability options, but most of the time, they suggest to use the GitHub Platform. Wikibase could be an option here.

I have to apologize for the Knowledge Graph Talk thing. I was a bit grumpy, because I thought I wasted a lot of time on the Talk page that could have been invested in making the article better (WP:BE_BOLD style), but now I think, it might have been my own mistake. So apologies for lashing out there.

(see comments below) On 20.09.19 17:53, Denny Vrandečić wrote:

Sebastian,

"I don't want to facilitate conspiracy theories, but ..." "[I am] interested in what is the truth behind the truth"

I am sorry, I truly am, but this *is* the language I know from conspiracy theorists. And given that, I cannot imagine that there is anything I can say that could convince you otherwise. Therefore there is no real point for me in engaging with this conversation on these terms, I cannot see how it would turn constructive.

The answers to many of your questions are public and on the record. Others tried to point you to them (thanks), but you dismiss them as not fitting your narrative.

So here's a suggestion, which I think might be much more constructive and forward-looking:

I have been working on a comparison of DBpedia, Wikidata, and Freebase (and since you've read my thesis, you know that's a thing I know a bit about). Simple evaluation, coverage, correctness, nothing dramatically fancy. But I am torn about publishing it, because, d'oh, people may (with good reasons) dismiss it as being biased. And truth be told - the simple fact that I don't know DBpedia as well as I know Wikidata and Freebase might indeed have lead to errors, mistakes, and stuff I missed in the evaluation. But you know what would help?

You.

My suggestion is that I publish my current draft, and then you and me work together on it, publically, in the open, until we reach a state we both consider correct enough for publication.

What do you think?

Sure, we are doing statistics at the moment as well. It is a bit hard to define what DBpedia is nowadays as we are rebranding the remixed datasets, now that we can pick up links and other data from the Databus. It might not even be a real dataset anymore, but glue between datasets focusing on the speed of integration and ease of quality improvement. Also still working on the concrete Sync Targets for GlobalFactSync ( https://meta.wikimedia.org/wiki/Grants:Project/DBpedia/GlobalFactSyncRE) as well.

One question I have is whether Wikidata is effective/efficient or where it is effective and where it could use improvement as a chance for collaboration.

So yes any time.

-- Sebastian

Cheers, Denny

P.S.: I am travelling the next week, so I may ask for patience

On Fri, Sep 20, 2019 at 8:11 AM Thad Guidry thadguidry@gmail.com wrote:

...
Thank you for sharing your opinions, Sebastian.

Cheers, Thad https://www.linkedin.com/in/thadguidry/

On Fri, Sep 20, 2019 at 9:43 AM Sebastian Hellmann < hellmann@informatik.uni-leipzig.de> wrote:

...
Hi Thad, On 20.09.19 15:28, Thad Guidry wrote:

With my tech evangelist hat on...

Google's philanthropy is nearly boundless when it comes to the promotion of knowledge. Why? Because indeed it's in their best interest otherwise no one can prosper without knowledge. They aggregate knowledge for the benefit of mankind, and then make a profit through advertising ... all while making that knowledge extremely easy to be found for the world.

I am neither pro-Google or anti-Google per se. Maybe skeptical and interested in what is the truth behind the truth. Google is not synonym to philanthropy. Wikimedia is or at least I think they are doing many things right. Google is a platform, so primarily they "aggregate knowledge for their benefit" while creating enough incentives in form of accessibility for users to add the user's knowledge to theirs. It is not about what Google offers, but what it takes in return. 20% of employees time is also an investment in the skill of the employee, a Google asset called Human Capital and also leads to me and Denny from Google discussing whether https://en.wikipedia.org/wiki/Talk:Knowledge_Graph is content marketing or knowledge (@Denny: no offense, legit arguments, but no agenda to resolve the stalled discussion there). Except I don't have 20% time to straighten the view into what I believe would be neutral, so pushing it becomes a resource issue.

I found the other replies much more realistic and the perspective is yet unclear. Maybe Mozilla wasn't so much frenemy with Google and got removed from the browser market for it. I am also thinking about Linked Open Data. Decentralisation is quite weak, individually. I guess spreading all the Wikibases around to super-nodes is helpful unless it prevents the formation of a stronger lobby of philanthropists or competition to BigTech. Wikidata created some pressure on DBpedia as well (also opportunities), but we are fine since we can simply innovate. Others might not withstand. Microsoft seems to favor OpenStreetMaps so I am just asking to which degree Open Source and Open Data is being instrumentalised by BigTech.

Hence my question, whether it is compromise or be removed. (Note that states are also platforms, which measure value in GDP and make laws and roads and take VAT on transactions. Sometimes, they even don't remove opposition.)

-- All the best, Sebastian Hellmann

Director of Knowledge Integration and Linked Data Technologies (KILT) Competence Center at the Institute for Applied Informatics (InfAI) at Leipzig University Executive Director of the DBpedia Association Projects: http://dbpedia.org, http://nlp2rdf.org, http://linguistics.okfn.org, https://www.w3.org/community/ld4lt http://www.w3.org/community/ld4lt Homepage: http://aksw.org/SebastianHellmann Research Group: http://aksw.org

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

-- All the best, Sebastian Hellmann

Director of Knowledge Integration and Linked Data Technologies (KILT) Competence Center at the Institute for Applied Informatics (InfAI) at Leipzig University Executive Director of the DBpedia Association Projects: http://dbpedia.org, http://nlp2rdf.org, http://linguistics.okfn.org, https://www.w3.org/community/ld4lt http://www.w3.org/community/ld4lt Homepage: http://aksw.org/SebastianHellmann Research Group: http://aksw.org

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

-- Samuel Klein @metasj w:user:sj +1 617 529 4266

Denny Vrandečić

11:37 p.m.

I would love your input! I will send the link here, and any contribution will be welcome :)

Thank you!

On Fri, Sep 20, 2019 at 11:05 AM Samuel Klein meta.sj@gmail.com wrote:

...

I'm also interested in this comparison and intersection, and glad to share perspective + help. Warmly, SJ

On Fri, Sep 20, 2019 at 1:32 PM Denny Vrandečić vrandecic@gmail.com wrote:

...
Yes, you're touching exactly on the problems I had during the evaluation

I couldn't even figure out what DBpedia is. Thanks, your help will be

very much appreciated.

OK, I will send a link the week after the next, and then we can start working on it :) I am very much looking forward to it.

On Fri, Sep 20, 2019 at 10:11 AM Sebastian Hellmann < hellmann@informatik.uni-leipzig.de> wrote:

...
Na, I am quite open, albeit impulsive. The information given was quite good and some of my concerns regarding the involvement of Google were also lifted or relativized. Mainly due to the fact that there seems to be a sense of awareness.

I am just studying economic principles, which are very powerful. I also have the feeling that free and open stuff just got a lot more commercial and I am still struggling with myself whether this is good or not. Also whether DBpedia should become frenemies with BigTech. Or funny things like many funding agencies try to push for national sustainability options, but most of the time, they suggest to use the GitHub Platform. Wikibase could be an option here.

I have to apologize for the Knowledge Graph Talk thing. I was a bit grumpy, because I thought I wasted a lot of time on the Talk page that could have been invested in making the article better (WP:BE_BOLD style), but now I think, it might have been my own mistake. So apologies for lashing out there.

(see comments below) On 20.09.19 17:53, Denny Vrandečić wrote:

Sebastian,

"I don't want to facilitate conspiracy theories, but ..." "[I am] interested in what is the truth behind the truth"

I am sorry, I truly am, but this *is* the language I know from conspiracy theorists. And given that, I cannot imagine that there is anything I can say that could convince you otherwise. Therefore there is no real point for me in engaging with this conversation on these terms, I cannot see how it would turn constructive.

The answers to many of your questions are public and on the record. Others tried to point you to them (thanks), but you dismiss them as not fitting your narrative.

So here's a suggestion, which I think might be much more constructive and forward-looking:

I have been working on a comparison of DBpedia, Wikidata, and Freebase (and since you've read my thesis, you know that's a thing I know a bit about). Simple evaluation, coverage, correctness, nothing dramatically fancy. But I am torn about publishing it, because, d'oh, people may (with good reasons) dismiss it as being biased. And truth be told - the simple fact that I don't know DBpedia as well as I know Wikidata and Freebase might indeed have lead to errors, mistakes, and stuff I missed in the evaluation. But you know what would help?

You.

My suggestion is that I publish my current draft, and then you and me work together on it, publically, in the open, until we reach a state we both consider correct enough for publication.

What do you think?

Sure, we are doing statistics at the moment as well. It is a bit hard to define what DBpedia is nowadays as we are rebranding the remixed datasets, now that we can pick up links and other data from the Databus. It might not even be a real dataset anymore, but glue between datasets focusing on the speed of integration and ease of quality improvement. Also still working on the concrete Sync Targets for GlobalFactSync ( https://meta.wikimedia.org/wiki/Grants:Project/DBpedia/GlobalFactSyncRE) as well.

One question I have is whether Wikidata is effective/efficient or where it is effective and where it could use improvement as a chance for collaboration.

So yes any time.

-- Sebastian

Cheers, Denny

P.S.: I am travelling the next week, so I may ask for patience

On Fri, Sep 20, 2019 at 8:11 AM Thad Guidry thadguidry@gmail.com wrote:

...
Thank you for sharing your opinions, Sebastian.

Cheers, Thad https://www.linkedin.com/in/thadguidry/

On Fri, Sep 20, 2019 at 9:43 AM Sebastian Hellmann < hellmann@informatik.uni-leipzig.de> wrote:

...
Hi Thad, On 20.09.19 15:28, Thad Guidry wrote:

With my tech evangelist hat on...

Google's philanthropy is nearly boundless when it comes to the promotion of knowledge. Why? Because indeed it's in their best interest otherwise no one can prosper without knowledge. They aggregate knowledge for the benefit of mankind, and then make a profit through advertising ... all while making that knowledge extremely easy to be found for the world.

I am neither pro-Google or anti-Google per se. Maybe skeptical and interested in what is the truth behind the truth. Google is not synonym to philanthropy. Wikimedia is or at least I think they are doing many things right. Google is a platform, so primarily they "aggregate knowledge for their benefit" while creating enough incentives in form of accessibility for users to add the user's knowledge to theirs. It is not about what Google offers, but what it takes in return. 20% of employees time is also an investment in the skill of the employee, a Google asset called Human Capital and also leads to me and Denny from Google discussing whether https://en.wikipedia.org/wiki/Talk:Knowledge_Graph is content marketing or knowledge (@Denny: no offense, legit arguments, but no agenda to resolve the stalled discussion there). Except I don't have 20% time to straighten the view into what I believe would be neutral, so pushing it becomes a resource issue.

I found the other replies much more realistic and the perspective is yet unclear. Maybe Mozilla wasn't so much frenemy with Google and got removed from the browser market for it. I am also thinking about Linked Open Data. Decentralisation is quite weak, individually. I guess spreading all the Wikibases around to super-nodes is helpful unless it prevents the formation of a stronger lobby of philanthropists or competition to BigTech. Wikidata created some pressure on DBpedia as well (also opportunities), but we are fine since we can simply innovate. Others might not withstand. Microsoft seems to favor OpenStreetMaps so I am just asking to which degree Open Source and Open Data is being instrumentalised by BigTech.

Hence my question, whether it is compromise or be removed. (Note that states are also platforms, which measure value in GDP and make laws and roads and take VAT on transactions. Sometimes, they even don't remove opposition.)

-- All the best, Sebastian Hellmann

Director of Knowledge Integration and Linked Data Technologies (KILT) Competence Center at the Institute for Applied Informatics (InfAI) at Leipzig University Executive Director of the DBpedia Association Projects: http://dbpedia.org, http://nlp2rdf.org, http://linguistics.okfn.org, https://www.w3.org/community/ld4lt http://www.w3.org/community/ld4lt Homepage: http://aksw.org/SebastianHellmann Research Group: http://aksw.org

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

-- All the best, Sebastian Hellmann

Director of Knowledge Integration and Linked Data Technologies (KILT) Competence Center at the Institute for Applied Informatics (InfAI) at Leipzig University Executive Director of the DBpedia Association Projects: http://dbpedia.org, http://nlp2rdf.org, http://linguistics.okfn.org, https://www.w3.org/community/ld4lt http://www.w3.org/community/ld4lt Homepage: http://aksw.org/SebastianHellmann Research Group: http://aksw.org

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

-- Samuel Klein @metasj w:user:sj +1 617 529 4266 <(617)%20529-4266> _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

hellmann＠informatik.uni-leipzig.de

21 Sep 21 Sep

7:01 p.m.

One more thing, I would be interested in. I don't think comparing wikidata and freebase to DBpedia will make sense as these are sources for us. However we could compare DBpedia including the Wikidata and Freebase part to the Google Knowledge Graph and repeat this every three months to guide our community in integrating more sources. Can we do that?

-- Sebastian

On September 20, 2019 8:07:28 PM GMT+02:00, "Denny Vrandečić" vrandecic@google.com wrote:

...

I would love your input! I will send the link here, and any contribution will be welcome :)

Thank you!

On Fri, Sep 20, 2019 at 11:05 AM Samuel Klein meta.sj@gmail.com wrote:

...
I'm also interested in this comparison and intersection, and glad to

share

...
perspective + help. Warmly, SJ

On Fri, Sep 20, 2019 at 1:32 PM Denny Vrandečić vrandecic@gmail.com wrote:

...
Yes, you're touching exactly on the problems I had during the

evaluation

...
...

I couldn't even figure out what DBpedia is. Thanks, your help will

be

...
...
very much appreciated.

OK, I will send a link the week after the next, and then we can

start

...
...
working on it :) I am very much looking forward to it.

On Fri, Sep 20, 2019 at 10:11 AM Sebastian Hellmann < hellmann@informatik.uni-leipzig.de> wrote:

...
Na, I am quite open, albeit impulsive. The information given was

quite

...
...
...
good and some of my concerns regarding the involvement of Google

were also

...
...
...
lifted or relativized. Mainly due to the fact that there seems to

be a

...
...
...
sense of awareness.

I am just studying economic principles, which are very powerful. I

also

...
...
...
have the feeling that free and open stuff just got a lot more

commercial

...
...
...
and I am still struggling with myself whether this is good or not.

Also

...
...
...
whether DBpedia should become frenemies with BigTech. Or funny

things like

...
...
...
many funding agencies try to push for national sustainability

options, but

...
...
...
most of the time, they suggest to use the GitHub Platform. Wikibase

could

...
...
...
be an option here.

I have to apologize for the Knowledge Graph Talk thing. I was a bit grumpy, because I thought I wasted a lot of time on the Talk page

that

...
...
...
could have been invested in making the article better (WP:BE_BOLD

style),

...
...
...
but now I think, it might have been my own mistake. So apologies

for

...
...
...
lashing out there.

(see comments below) On 20.09.19 17:53, Denny Vrandečić wrote:

Sebastian,

"I don't want to facilitate conspiracy theories, but ..." "[I am] interested in what is the truth behind the truth"

I am sorry, I truly am, but this *is* the language I know from conspiracy theorists. And given that, I cannot imagine that there

is

...
...
...
anything I can say that could convince you otherwise. Therefore

there is no

...
...
...
real point for me in engaging with this conversation on these

terms, I

...
...
...
cannot see how it would turn constructive.

The answers to many of your questions are public and on the record. Others tried to point you to them (thanks), but you dismiss them as

not

...
...
...
fitting your narrative.

So here's a suggestion, which I think might be much more

constructive

...
...
...
and forward-looking:

I have been working on a comparison of DBpedia, Wikidata, and

Freebase

...
...
...
(and since you've read my thesis, you know that's a thing I know a

bit

...
...
...
about). Simple evaluation, coverage, correctness, nothing

dramatically

...
...
...
fancy. But I am torn about publishing it, because, d'oh, people may

(with

...
...
...
good reasons) dismiss it as being biased. And truth be told - the

simple

...
...
...
fact that I don't know DBpedia as well as I know Wikidata and

Freebase

...
...
...
might indeed have lead to errors, mistakes, and stuff I missed in

the

...
...
...
evaluation. But you know what would help?

You.

My suggestion is that I publish my current draft, and then you and

me

...
...
...
work together on it, publically, in the open, until we reach a

state we

...
...
...
both consider correct enough for publication.

What do you think?

Sure, we are doing statistics at the moment as well. It is a bit

hard to

...
...
...
define what DBpedia is nowadays as we are rebranding the remixed

datasets,

...
...
...
now that we can pick up links and other data from the Databus. It

might not

...
...
...
even be a real dataset anymore, but glue between datasets focusing

on the

...
...
...
speed of integration and ease of quality improvement. Also still

working on

...
...
...
the concrete Sync Targets for GlobalFactSync (

https://meta.wikimedia.org/wiki/Grants:Project/DBpedia/GlobalFactSyncRE)

...
...
...
as well.

One question I have is whether Wikidata is effective/efficient or

where

...
...
...
it is effective and where it could use improvement as a chance for collaboration.

So yes any time.

-- Sebastian

Cheers, Denny

P.S.: I am travelling the next week, so I may ask for patience

On Fri, Sep 20, 2019 at 8:11 AM Thad Guidry thadguidry@gmail.com wrote:

...
Thank you for sharing your opinions, Sebastian.

Cheers, Thad https://www.linkedin.com/in/thadguidry/

On Fri, Sep 20, 2019 at 9:43 AM Sebastian Hellmann < hellmann@informatik.uni-leipzig.de> wrote:

...
Hi Thad, On 20.09.19 15:28, Thad Guidry wrote:

With my tech evangelist hat on...

Google's philanthropy is nearly boundless when it comes to the promotion of knowledge. Why? Because indeed it's in their best

interest

...
...
...
...
...
otherwise no one can prosper without knowledge. They aggregate

knowledge

...
...
...
...
...
for the benefit of mankind, and then make a profit through

advertising ...

...
...
...
...
...
all while making that knowledge extremely easy to be found for

the world.

...
...
...
...
...
I am neither pro-Google or anti-Google per se. Maybe skeptical

and

...
...
...
...
...
interested in what is the truth behind the truth. Google is not

synonym to

...
...
...
...
...
philanthropy. Wikimedia is or at least I think they are doing

many things

...
...
...
...
...
right. Google is a platform, so primarily they "aggregate

knowledge for

...
...
...
...
...
their benefit" while creating enough incentives in form of

accessibility

...
...
...
...
...
for users to add the user's knowledge to theirs. It is not about

what

...
...
...
...
...
Google offers, but what it takes in return. 20% of employees time

is also

...
...
...
...
...
an investment in the skill of the employee, a Google asset called

Human

...
...
...
...
...
Capital and also leads to me and Denny from Google discussing

whether

...
...
...
...
...
https://en.wikipedia.org/wiki/Talk:Knowledge_Graph is content marketing or knowledge (@Denny: no offense, legit arguments, but

no agenda

...
...
...
...
...
to resolve the stalled discussion there). Except I don't have 20%

time to

...
...
...
...
...
straighten the view into what I believe would be neutral, so

pushing it

...
...
...
...
...
becomes a resource issue.

I found the other replies much more realistic and the perspective

is

...
...
...
...
...
yet unclear. Maybe Mozilla wasn't so much frenemy with Google and

got

...
...
...
...
...
removed from the browser market for it. I am also thinking about

Linked

...
...
...
...
...
Open Data. Decentralisation is quite weak, individually. I guess

spreading

...
...
...
...
...
all the Wikibases around to super-nodes is helpful unless it

prevents the

...
...
...
...
...
formation of a stronger lobby of philanthropists or competition

to BigTech.

...
...
...
...
...
Wikidata created some pressure on DBpedia as well (also

opportunities), but

...
...
...
...
...
we are fine since we can simply innovate. Others might not

withstand.

...
...
...
...
...
Microsoft seems to favor OpenStreetMaps so I am just asking to

which degree

...
...
...
...
...
Open Source and Open Data is being instrumentalised by BigTech.

Hence my question, whether it is compromise or be removed. (Note

that

...
...
...
...
...
states are also platforms, which measure value in GDP and make

laws and

...
...
...
...
...
roads and take VAT on transactions. Sometimes, they even don't

remove

...
...
...
...
...
opposition.)

-- All the best, Sebastian Hellmann

Director of Knowledge Integration and Linked Data Technologies

(KILT)

...
...
...
...
...
Competence Center at the Institute for Applied Informatics (InfAI) at Leipzig

University

...
...
...
...
...
Executive Director of the DBpedia Association Projects: http://dbpedia.org, http://nlp2rdf.org, http://linguistics.okfn.org, https://www.w3.org/community/ld4lt http://www.w3.org/community/ld4lt Homepage: http://aksw.org/SebastianHellmann Research Group: http://aksw.org

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

-- All the best, Sebastian Hellmann

Director of Knowledge Integration and Linked Data Technologies

(KILT)

...
...
...
Competence Center at the Institute for Applied Informatics (InfAI) at Leipzig

University

...
...
...
Executive Director of the DBpedia Association Projects: http://dbpedia.org, http://nlp2rdf.org, http://linguistics.okfn.org, https://www.w3.org/community/ld4lt http://www.w3.org/community/ld4lt Homepage: http://aksw.org/SebastianHellmann Research Group: http://aksw.org

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

-- Samuel Klein @metasj w:user:sj +1 617 529

4266

...
<(617)%20529-4266> _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

-- Sent from my Android device with K-9 Mail. Please excuse my brevity.

Kingsley Idehen

22 Sep 22 Sep

4 a.m.

On 9/20/19 1:31 PM, Denny Vrandečić wrote:

...

Yes, you're touching exactly on the problems I had during the evaluation - I couldn't even figure out what DBpedia is.

Hi Denny and Sebastian,

To reiterate and/or clarify.

DBpedia is a community project comprising RDF datasets constructed from Wikipedia content that's deployed using Linked Data principles.

The description above implies the following re focus breakdown:

[1] Dataset creation -- this cannot be created in line with Linked Data principles without the items that follow

[2] Linked Data Deployment -- without this there is nothing to look-up re follow-your-nose exploration

[3] SPARQL Query Services -- without this there is nothing to query

Over the years I've written a number of posts addressing the key question "what is DBpedia?"

[1] https://medium.com/openlink-software-blog/what-is-dbpedia-and-why-is-it-impo... -- What is DBpedia, and why is it important?

[2] https://medium.com/virtuoso-blog/on-the-mutually-beneficial-nature-of-dbpedi... -- Mutually beneficial nature of Wikidata and DBpedia

-- Regards, Kingsley Idehen Founder & CEO OpenLink Software Home Page: http://www.openlinksw.com Community Support: https://community.openlinksw.com Weblogs (Blogs): Company Blog: https://medium.com/openlink-software-blog Virtuoso Blog: https://medium.com/virtuoso-blog Data Access Drivers Blog: https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers Personal Weblogs (Blogs): Medium Blog: https://medium.com/@kidehen Legacy Blogs: http://www.openlinksw.com/blog/~kidehen/ http://kidehen.blogspot.com Profile Pages: Pinterest: https://www.pinterest.com/kidehen/ Quora: https://www.quora.com/profile/Kingsley-Uyi-Idehen Twitter: https://twitter.com/kidehen Google+: https://plus.google.com/+KingsleyIdehen/about LinkedIn: http://www.linkedin.com/in/kidehen Web Identities (WebID): Personal: http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i : http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this

hellmann＠informatik.uni-leipzig.de

11:04 a.m.

Hi Kingsley,

that describes the core of the glue that DBpedia is. The definition leads to people downloading the EN DBpedia dataset and running statistics that will only discover what data is wrong or missing in the smallest parts of DBpedia.

What happened to "LOD is the largest knowledge graph on earth" ? Querying more Freebase data from DBpedia via Linked Data is a use case since over 10 years now using ontologies as a GPS.

Also the definition you give limits the community to people who have edited 10 Scala Classes in the extraction framework, which is probably 10 people altogether.

So this is the most exclusionist view I can think of.

What you wrote here is adequate: https://medium.com/openlink-software-blog/what-is-dbpedia-and-why-is-it-impo...

What you wrote in your email as a summary is very narrow and misleading, see Markus Kroetzsch's email. People will continue to measure DBpedia by exactly the part of the data that is loaded in the Virtuoso SPARQL endpoint unless we make the derivatives downloadable outside of HTTP LD requests.

-- Sebastian

On September 22, 2019 12:30:24 AM GMT+02:00, Kingsley Idehen kidehen@openlinksw.com wrote:

...

On 9/20/19 1:31 PM, Denny Vrandečić wrote:

...
Yes, you're touching exactly on the problems I had during the evaluation - I couldn't even figure out what DBpedia is.

Hi Denny and Sebastian,

To reiterate and/or clarify.

DBpedia is a community project comprising RDF datasets constructed from Wikipedia content that's deployed using Linked Data principles.

The description above implies the following re focus breakdown:

[1] Dataset creation -- this cannot be created in line with Linked Data principles without the items that follow

[2] Linked Data Deployment -- without this there is nothing to look-up re follow-your-nose exploration

[3] SPARQL Query Services -- without this there is nothing to query

Over the years I've written a number of posts addressing the key question "what is DBpedia?"

[1] https://medium.com/openlink-software-blog/what-is-dbpedia-and-why-is-it-impo... -- What is DBpedia, and why is it important?

[2] https://medium.com/virtuoso-blog/on-the-mutually-beneficial-nature-of-dbpedi... -- Mutually beneficial nature of Wikidata and DBpedia

-- Regards,

Kingsley Idehen Founder & CEO OpenLink Software Home Page: http://www.openlinksw.com Community Support: https://community.openlinksw.com Weblogs (Blogs): Company Blog: https://medium.com/openlink-software-blog Virtuoso Blog: https://medium.com/virtuoso-blog Data Access Drivers Blog: https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers

Personal Weblogs (Blogs): Medium Blog: https://medium.com/@kidehen Legacy Blogs: http://www.openlinksw.com/blog/~kidehen/ http://kidehen.blogspot.com

Profile Pages: Pinterest: https://www.pinterest.com/kidehen/ Quora: https://www.quora.com/profile/Kingsley-Uyi-Idehen Twitter: https://twitter.com/kidehen Google+: https://plus.google.com/+KingsleyIdehen/about LinkedIn: http://www.linkedin.com/in/kidehen

Web Identities (WebID): Personal: http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i : http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this

-- Sent from my Android device with K-9 Mail. Please excuse my brevity.

Kingsley Idehen

9:14 p.m.

On 9/22/19 1:34 AM, hellmann@informatik.uni-leipzig.de wrote:

...

Hi Kingsley,

that describes the core of the glue that DBpedia is. The definition leads to people downloading the EN DBpedia dataset and running statistics that will only discover what data is wrong or missing in the smallest parts of DBpedia.

The question was "What is DBpedia?" . What is misleading about it being about Wikipedia content transformed into RDF and deployed using Linked Data principles?

...

What happened to "LOD is the largest knowledge graph on earth" ?

The question wasn't "What is the LOD Cloud?" or am I missing something here.

...

Querying more Freebase data from DBpedia via Linked Data is a use case since over 10 years now using ontologies as a GPS.

Freebase is yet another derivative of Wikipedia content, isn't it?

...

Also the definition you give limits the community to people who have edited 10 Scala Classes in the extraction framework, which is probably 10 people altogether.

Look, can't you simply make a clear statement of what is missing from my definition of DBpedia? I sense you are talking about all the other utilities that have been developed by the project beyond dataset production e.g., services like DBpedia Spotlight etc?

...

So this is the most exclusionist view I can think of.

What you wrote here is adequate: https://medium.com/openlink-software-blog/what-is-dbpedia-and-why-is-it-impo...

What you wrote in your email as a summary is very narrow and misleading, see Markus Kroetzsch's email. People will continue to measure DBpedia by exactly the part of the data that is loaded in the Virtuoso SPARQL endpoint unless we make the derivatives downloadable outside of HTTP LD requests.

You really have to try using a slightly better tone when communicating.

You could simply say:

Kingsley, here are some thing that could be overlooked based on the description your presented:

Item 1..N.

I'll just fix it, or worst case agree to disagree.

Kingsley

...

-- Sebastian

On September 22, 2019 12:30:24 AM GMT+02:00, Kingsley Idehen kidehen@openlinksw.com wrote:

On 9/20/19 1:31 PM, Denny Vrandečić wrote:

    Yes, you're touching exactly on the problems I had during the
    evaluation - I couldn't even figure out what DBpedia is. 


Hi Denny and Sebastian,

To reiterate and/or clarify.

DBpedia is a community project comprising RDF datasets constructed from
Wikipedia content that's deployed using Linked Data principles.

The description above implies the following re focus breakdown:

[1] Dataset creation -- this cannot be created in line with Linked Data
principles without the items that follow

[2] Linked Data Deployment -- without this there is nothing to look-up
re follow-your-nose exploration

[3] SPARQL Query Services  -- without this there is nothing to query

Over the years I've written a number of posts addressing the key
question "what is DBpedia?"

[1]
https://medium.com/openlink-software-blog/what-is-dbpedia-and-why-is-it-important-d306b5324f90
-- What is DBpedia, and why is it important?

[2]
https://medium.com/virtuoso-blog/on-the-mutually-beneficial-nature-of-dbpedia-and-wikidata-5fb2b9f22ada
-- Mutually beneficial nature of Wikidata and DBpedia

-- Sent from my Android device with K-9 Mail. Please excuse my brevity.

Kingsley Idehen

9:25 p.m.

On 9/21/19 6:30 PM, Kingsley Idehen wrote:

...

On 9/20/19 1:31 PM, Denny Vrandečić wrote:

...
Yes, you're touching exactly on the problems I had during the evaluation - I couldn't even figure out what DBpedia is.

Hi Denny and Sebastian,

To reiterate and/or clarify.

DBpedia is a community project comprising RDF datasets constructed from Wikipedia content that's deployed using Linked Data principles.

A little clearer, as the definition above was a little too concise:

DBpedia is a community project comprising a variety of data curation tools, services (Linked Data lookup and SPARQL), and RDF datasets constructed from Wikipedia that's deployed using Linked Data principles and cross-referenced with other data sources as illustrated in the Linked Open Data Cloud (the world's largest Knowledge Graph)[1][2].

This project has recently spawned a Databus effort which addresses historic challenges associated with dataset curation, publication, discovery, and monetization [3].

[1] https://lod-cloud.ne2

[2] https://medium.com/virtuoso-blog/what-is-the-linked-open-data-cloud-and-why-... -- what is the LOD Cloud and why is it important?

[3] https://databus.dbpedia.org/ -- Databus

Kingsley Idehen

23 Sep 23 Sep

5:02 a.m.

On 9/22/19 11:55 AM, Kingsley Idehen wrote:

...

On 9/21/19 6:30 PM, Kingsley Idehen wrote:

...
On 9/20/19 1:31 PM, Denny Vrandečić wrote:

...
Yes, you're touching exactly on the problems I had during the evaluation - I couldn't even figure out what DBpedia is.

Hi Denny and Sebastian,

To reiterate and/or clarify.

DBpedia is a community project comprising RDF datasets constructed from Wikipedia content that's deployed using Linked Data principles.

A little clearer, as the definition above was a little too concise:

DBpedia is a community project comprising a variety of data curation tools, services (Linked Data lookup and SPARQL), and RDF datasets constructed from Wikipedia that's deployed using Linked Data principles and cross-referenced with other data sources as illustrated in the Linked Open Data Cloud (the world's largest Knowledge Graph)[1][2].

This project has recently spawned a Databus effort which addresses historic challenges associated with dataset curation, publication, discovery, and monetization [3].

[1] https://lod-cloud.ne2

[2] https://medium.com/virtuoso-blog/what-is-the-linked-open-data-cloud-and-why-... -- what is the LOD Cloud and why is it important?

[3] https://databus.dbpedia.org/ -- Databus

TypoFix:

[1] https://lod-cloud.net

Markus Kroetzsch

21 Sep 21 Sep

10:12 p.m.

On 20/09/2019 17:53, Denny Vrandečić wrote: ...

...

I have been working on a comparison of DBpedia, Wikidata, and Freebase (and since you've read my thesis, you know that's a thing I know a bit about). Simple evaluation, coverage, correctness, nothing dramatically fancy. But I am torn about publishing it, because, d'oh, people may (with good reasons) dismiss it as being biased. And truth be told - the simple fact that I don't know DBpedia as well as I know Wikidata and Freebase might indeed have lead to errors, mistakes, and stuff I missed in the evaluation. But you know what would help?

I would also be very interested in seeing this. I had a closer look at DBpedia recently for a tutorial and was surprised by how different the data is in comparison to Wikidata. A methodological comparison would surely be helpful.

Of course, it has to be fair, taking into account that DBpedia editions are based on a Wikipedia in one language (hence is always missing entities that Wikidata has). For example, I recently computed the difference between the following two:

(1) The set of all pairs of ancestors that one can find by following (paths of) parent relations on EN DBPedia. (2) The set of all pairs of ancestors that one can find by following (paths of) mother/father relations on Wikidata, but visiting only items that are present in English Wikipedia.

I am not sure if this is fair or not, but I found it an interesting setup (non-local effects of incompleteness) -- and (2) is a nice illustration of something you cannot achieve in SPARQL on principled grounds ;-).

Cheers,

Markus

Andra Waagmeester

22 Sep 22 Sep

5:05 a.m.

Agree, I am also interested in seeing this. I recently did a small comparison on science awards on coverage of laureates in both DBpedia and wikidata and came to the same conclusion. The difference sometimes was quite substantial in favour of Wikidata.

[image: image.png]

...

I would also be very interested in seeing this. I had a closer look at DBpedia recently for a tutorial and was surprised by how different the data is in comparison to Wikidata. A methodological comparison would surely be helpful.

Of course, it has to be fair, taking into account that DBpedia editions are based on a Wikipedia in one language (hence is always missing entities that Wikidata has). For example, I recently computed the difference between the following two:

(1) The set of all pairs of ancestors that one can find by following (paths of) parent relations on EN DBPedia. (2) The set of all pairs of ancestors that one can find by following (paths of) mother/father relations on Wikidata, but visiting only items that are present in English Wikipedia.

I am not sure if this is fair or not, but I found it an interesting setup (non-local effects of incompleteness) -- and (2) is a nice illustration of something you cannot achieve in SPARQL on principled grounds ;-).

Cheers,

Markus

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Sebastian Hellmann

12:18 p.m.

Still comparing a dataset (Wikidata) to an integration hub (DBpedia).

I would assume that popularity of content (e.g. Wikipedia page hits) directly relates to availability of data in Wikidata.

We have long fused all of this in a "best of" called FlexiFusion: https://svn.aksw.org/papers/2019/ISWC_FlexiFusion/public.pdf

Future agenda is to: - stabilize this release variant of DBepdia (fused and enriched) - mix in external (authoritative) datasets based on the references in WP and WD to create ultimate lists (total global coverage and correctness) - export enriched versions either using Wikidata's P's or WP's infoboxes, so it can be integrated back into Wikimedia (with references) and also sync it to whoever needs the data.

This is part of GlobalFactSyncRE: https://meta.wikimedia.org/wiki/Grants:Project/DBpedia/GlobalFactSyncRE

The formula here is quite easy: If you look at DBpedia's data in detail or a part of it, it will not shine so much since it is extracted, if you look at the flexibility and scalability of integration it will win. We are strengthening the tooling for the second part.

-- Sebastian

On 22.09.19 01:35, Andra Waagmeester wrote:

...

image.png

I would also be very interested in seeing this. I had a closer
look at
DBpedia recently for a tutorial and was surprised by how different
the
data is in comparison to Wikidata. A methodological comparison would
surely be helpful.

Of course, it has to be fair, taking into account that DBpedia
editions
are based on a Wikipedia in one language (hence is always missing
entities that Wikidata has). For example, I recently computed the
difference between the following two:

(1) The set of all pairs of ancestors that one can find by following
(paths of) parent relations on EN DBPedia.
(2) The set of all pairs of ancestors that one can find by following
(paths of) mother/father relations on Wikidata, but visiting only
items
that are present in English Wikipedia.

I am not sure if this is fair or not, but I found it an interesting
setup (non-local effects of incompleteness) -- and (2) is a nice
illustration of something you cannot achieve in SPARQL on principled
grounds ;-).

Cheers,

Markus

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Markus Kroetzsch

2:11 p.m.

On 22/09/2019 08:48, Sebastian Hellmann wrote: ...

...

The formula here is quite easy: If you look at DBpedia's data in detail or a part of it, it will not shine so much since it is extracted,

Sure, but I think that this is not clear to many people who are currently using DBpedia as a dataset (even if only for testing/research purposes). Also, there would surely be value in analysing the differences more closely. I agree with you that quantitatively, Wikidata might be orders of magnitudes ahead. Yet, there can still be individual bits of information that are in DBpedia but missing from Wikidata so far.

For example, DBpedia EN has 32 people educated at the University of Leipzig, whereas Wikidata has 1217. Nevertheless, there is, for example, John Henry Wright (Q6238997), who is known to DBpedia but not to Wikidata (yet). Such cases might be worth systematic weeding out so that we can really come to the point where Wikidata is a strict superset of all (correct) data in DBpedia.

Cheers,

Markus

hellmann＠informatik.uni-leipzig.de

2:58 p.m.

DBpedia actually has no data, we provide tools to more effectively use OTHER PEOPLE'S DATA, e.g. Wikipedia.

Here is an image of the maximum size of the new scalable and actually bulk downloadable DBpedia via Databus in let's say one or two years:

https://lod-cloud.net/

With Download As Wikidata Q's and P's Option.

It's there, just hard to download in bulk.

LG, Sebastian

On September 22, 2019 10:41:10 AM GMT+02:00, Markus Kroetzsch markus.kroetzsch@tu-dresden.de wrote:

...

On 22/09/2019 08:48, Sebastian Hellmann wrote: ...

...
The formula here is quite easy: If you look at DBpedia's data in

detail

...
or a part of it, it will not shine so much since it is extracted,

Sure, but I think that this is not clear to many people who are currently using DBpedia as a dataset (even if only for testing/research

purposes). Also, there would surely be value in analysing the differences more closely. I agree with you that quantitatively, Wikidata might be orders of magnitudes ahead. Yet, there can still be individual

bits of information that are in DBpedia but missing from Wikidata so far.

For example, DBpedia EN has 32 people educated at the University of Leipzig, whereas Wikidata has 1217. Nevertheless, there is, for example, John Henry Wright (Q6238997), who is known to DBpedia but not to Wikidata (yet). Such cases might be worth systematic weeding out so that we can really come to the point where Wikidata is a strict superset of all (correct) data in DBpedia.

Cheers,

Markus

-- Sent from my Android device with K-9 Mail. Please excuse my brevity.

Gerard Meijssen

3:16 p.m.

Hoi,

...

From my perspective the point of a data set is for it to be used. The

extend in which it is used defines how useful an individual data set is. I even blogged about it .. [1] Thanks, GerardM

[1] https://ultimategerardm.blogspot.com/2019/09/comparing-datasets-bigger-or-be...

On Sun, 22 Sep 2019 at 11:29, hellmann@informatik.uni-leipzig.de wrote:

...

DBpedia actually has no data, we provide tools to more effectively use OTHER PEOPLE'S DATA, e.g. Wikipedia.

Here is an image of the maximum size of the new scalable and actually bulk downloadable DBpedia via Databus in let's say one or two years:

https://lod-cloud.net/

With Download As Wikidata Q's and P's Option.

It's there, just hard to download in bulk.

LG, Sebastian

On September 22, 2019 10:41:10 AM GMT+02:00, Markus Kroetzsch < markus.kroetzsch@tu-dresden.de> wrote:

...
On 22/09/2019 08:48, Sebastian Hellmann wrote: ...

...
The formula here is quite easy: If you look at DBpedia's data in detail or a part of it, it will not shine so much since it is extracted,

Sure, but I think that this is not clear to many people who are currently using DBpedia as a dataset (even if only for testing/research purposes). Also, there would surely be value in analysing the differences more closely. I agree with you that quantitatively, Wikidata might be orders of magnitudes ahead. Yet, there can still be individual bits of information that are in DBpedia but missing from Wikidata so far.

For example, DBpedia EN has 32 people educated at the University of Leipzig, whereas Wikidata has 1217. Nevertheless, there is, for example, John Henry Wright (Q6238997), who is known to DBpedia but not to Wikidata (yet). Such cases might be worth systematic weeding out so that we can really come to the point where Wikidata is a strict superset of all (correct) data in DBpedia.

Cheers,

Markus

-- Sent from my Android device with K-9 Mail. Please excuse my brevity. _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Kingsley Idehen

9:07 p.m.

On 9/21/19 7:35 PM, Andra Waagmeester wrote:

...

Agree, I am also interested in seeing this. I recently did a small comparison on science awards on coverage of laureates in both DBpedia and wikidata and came to the same conclusion. The difference sometimes was quite substantial in favour of Wikidata.

Are you not able to share SPARQL Query Results page links for this?

Marco Fossati

27 Sep 27 Sep

7:23 p.m.

Hey Sebastian,

On 9/20/19 10:22 AM, Sebastian Hellmann wrote:

...

Not much of Freebase did end up in Wikidata.

Dropping here some pointers to shed light on the migration of Freebase to Wikidata, since I was partially involved in the process: 1. WikiProject [1]; 2. the paper behind [2]; 3. datasets to be migrated [3].

I can confirm that the migration has stalled: as of today, *528 thousands* Freebase statements were curated by the community, out of *10 million* ones. By 'curated', I mean approved or rejected. These numbers come from two queries against the primary sources tool database.

The stall is due to several causes: in my opinion, the most important one was the bad quality of sources [4,5] coming from the Knowledge Vault project [6].

Cheers,

Marco

[1] https://www.wikidata.org/wiki/Wikidata:WikiProject_Freebase [2] http://static.googleusercontent.com/media/research.google.com/en//pubs/archi... [3] https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool/Version_1#Data [4] https://www.wikidata.org/wiki/Wikidata_talk:Primary_sources_tool/Archive/201... [5] https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Semi-automatic_A... [6] https://www.cs.ubc.ca/~murphyk/Papers/kv-kdd14.pdf

Sebastian Hellmann

7:50 p.m.

Hi Marco,

I think, I looked at it some years ago and it still sounds like less than 5% made it, which is what I remember.

-- Sebastian

On 27.09.19 15:53, Marco Fossati wrote:

...

Hey Sebastian,

On 9/20/19 10:22 AM, Sebastian Hellmann wrote:

...
Not much of Freebase did end up in Wikidata.

Dropping here some pointers to shed light on the migration of Freebase to Wikidata, since I was partially involved in the process:

WikiProject [1];

the paper behind [2];

datasets to be migrated [3].

I can confirm that the migration has stalled: as of today, *528 thousands* Freebase statements were curated by the community, out of *10 million* ones. By 'curated', I mean approved or rejected. These numbers come from two queries against the primary sources tool database.

The stall is due to several causes: in my opinion, the most important one was the bad quality of sources [4,5] coming from the Knowledge Vault project [6].

Cheers,

Marco

[1] https://www.wikidata.org/wiki/Wikidata:WikiProject_Freebase [2] http://static.googleusercontent.com/media/research.google.com/en//pubs/archi... [3] https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool/Version_1#Data [4] https://www.wikidata.org/wiki/Wikidata_talk:Primary_sources_tool/Archive/201... [5] https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Semi-automatic_A... [6] https://www.cs.ubc.ca/~murphyk/Papers/kv-kdd14.pdf

Gerard Meijssen

8:56 p.m.

Hoi, I totally reject the assertion was so bad. I have always had the opinion that the main issue was an atrocious user interface. Add to this the people that have Wikipedia notions about quality. They have and had a detrimental effect on both the quantity and quality of Wikidata.

When you add the functionality that is being build by the datawranglers at DBpedia, it becomes easy/easier to compare the data from Wikipedias with Wikidata (and why not Freebase) add what has consensus and curate the differences. This will enable a true datasense of quality and allows us to provide a much improved service. Thanks, GerardM

On Fri, 27 Sep 2019 at 15:54, Marco Fossati fossati@spaziodati.eu wrote:

...

Hey Sebastian,

On 9/20/19 10:22 AM, Sebastian Hellmann wrote:

...
Not much of Freebase did end up in Wikidata.

Dropping here some pointers to shed light on the migration of Freebase to Wikidata, since I was partially involved in the process:

WikiProject [1];

the paper behind [2];

datasets to be migrated [3].

I can confirm that the migration has stalled: as of today, *528 thousands* Freebase statements were curated by the community, out of *10 million* ones. By 'curated', I mean approved or rejected. These numbers come from two queries against the primary sources tool database.

The stall is due to several causes: in my opinion, the most important one was the bad quality of sources [4,5] coming from the Knowledge Vault project [6].

Cheers,

Marco

[1] https://www.wikidata.org/wiki/Wikidata:WikiProject_Freebase [2]

http://static.googleusercontent.com/media/research.google.com/en//pubs/archi... [3] https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool/Version_1#Data [4]

https://www.wikidata.org/wiki/Wikidata_talk:Primary_sources_tool/Archive/201... [5]

https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Semi-automatic_A... [6] https://www.cs.ubc.ca/~murphyk/Papers/kv-kdd14.pdf

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

hellmann＠informatik.uni-leipzig.de

28 Sep 28 Sep

12:57 p.m.

Hi Gerard,

I was not trying to judge here. I was just saying that it wasn't much data in the end. For me Freebase was basically cherry-picked.

Meanwhile, the data we extract is more pertinent to the goal of having Wikidata cover the info boxes. We still have ~ 500 million statements left. But none of it is used yet. Hopefully we can change that.

Meanwhile, Google crawls all the references and extracts facts from there. We don't have that available, but there is Linked Open Data.

-- Sebastian

On September 27, 2019 5:26:43 PM GMT+02:00, Gerard Meijssen gerard.meijssen@gmail.com wrote:

...

Hoi, I totally reject the assertion was so bad. I have always had the opinion that the main issue was an atrocious user interface. Add to this the people that have Wikipedia notions about quality. They have and had a detrimental effect on both the quantity and quality of Wikidata.

When you add the functionality that is being build by the datawranglers at DBpedia, it becomes easy/easier to compare the data from Wikipedias with Wikidata (and why not Freebase) add what has consensus and curate the differences. This will enable a true datasense of quality and allows us to provide a much improved service. Thanks, GerardM

On Fri, 27 Sep 2019 at 15:54, Marco Fossati fossati@spaziodati.eu wrote:

...
Hey Sebastian,

On 9/20/19 10:22 AM, Sebastian Hellmann wrote:

...
Not much of Freebase did end up in Wikidata.

Dropping here some pointers to shed light on the migration of

Freebase

...
to Wikidata, since I was partially involved in the process:

WikiProject [1];

the paper behind [2];

datasets to be migrated [3].

I can confirm that the migration has stalled: as of today, *528 thousands* Freebase statements were curated by the community, out of

*10

...
million* ones. By 'curated', I mean approved or rejected. These numbers come from two queries against the primary sources tool database.

The stall is due to several causes: in my opinion, the most important one was the bad quality of sources [4,5] coming from the Knowledge

Vault

...
project [6].

Cheers,

Marco

[1] https://www.wikidata.org/wiki/Wikidata:WikiProject_Freebase [2]

http://static.googleusercontent.com/media/research.google.com/en//pubs/archi...

...
[3]

https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool/Version_1#Data

...
[4]

https://www.wikidata.org/wiki/Wikidata_talk:Primary_sources_tool/Archive/201...

...
[5]

https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Semi-automatic_A...

...
[6] https://www.cs.ubc.ca/~murphyk/Papers/kv-kdd14.pdf

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

-- Sent from my Android device with K-9 Mail. Please excuse my brevity.

Denny Vrandečić

1 Oct 1 Oct

4:43 a.m.

New subject: Comparison of Wikidata, DBpedia, and Freebase (draft and invitation)

Hi all,

as promised, now that I am back from my trip, here's my draft of the comparison of Wikidata, DBpedia, and Freebase.

It is a draft, it is obviously potentially biased given my background, etc., but I hope that we can work on it together to get it into a good shape.

Markus, amusingly I took pretty much the same example that you went for, the parent predicate. So yes, I was also surprised by the results, and would love to have Sebastian or Kingsley look into it and see if I conducted it fairly.

SJ, Andra, thanks for offering to take a look. I am sure you all can contribute your own unique background and make suggestions on how to improve things and whether the results ring true.

Marco, I totally agree with what you said - the project has stalled, and there is plenty of opportunity to harvest more data from Freebase and bring it to Wikidata, and this should be reignited. Sebastian, I also agree with you, and the numbers do so too, the same is true with the extraction results from DBpedia.

Sebastian, Kingsley, I tried to describe how I understand DBpedia, and all steps should be reproducible. As it seems that the two of you also have to discuss one or the other thing about DBpedia's identity, I am relieved that my confusion is not entirely unjustified. So I tried to use both the last stable DBpedia release as well as a new-style DBpedia fusion dataset for the comparison. But I might have gotten the whole procedure wrong. I am happy to be corrected.

On Sat, Sep 28, 2019 at 12:28 AM hellmann@informatik.uni-leipzig.de wrote:

...

...
Meanwhile, Google crawls all the references and extracts facts from

there. We don't

...

have that available, but there is Linked Open Data.

Potentially, not a bad idea, but we don't do that.

Everyone, this is the first time I share a Colab notebook, and I have no idea if I did it right. So any feedback of the form "oh you didn't switch on that bit over here" or "yes, this works, thank you" is very welcome, because I have no clue what I am doing :) Also, I never did this kind of analysis so transparently, which is kinda both totally cool and rather scary, because now you can all see how dumb I am :)

So everyone is invited to send Pull Requests (I guess that's how this works?), and I would love for us to create a result together that we agree on. I see the result of this exercise to be potentially twofold:

1) a publication we can point people to who ask about the differences between Wikidata, DBpedia, and Freebase

2) to reignite or start projects and processes to reduce these differences

So, here is the link to my Colab notebook:

https://github.com/vrandezo/colabs/blob/master/Comparing_coverage_and_accura...

Ideally, the third goal could be to get to a deeper understanding of how these three projects relate to each other - in my point of view, Freebase is dead and outdated, Wikidata is the core knowledge base that anyone can edit, and DBpedia is the core project to weave value-adding workflows on top of Wikidata or other datasets from the linked open data cloud together. But that's just a proposal.

Cheers, Denny

On Sat, Sep 28, 2019 at 12:28 AM hellmann@informatik.uni-leipzig.de wrote:

...

Hi Gerard,

I was not trying to judge here. I was just saying that it wasn't much data in the end. For me Freebase was basically cherry-picked.

Meanwhile, the data we extract is more pertinent to the goal of having Wikidata cover the info boxes. We still have ~ 500 million statements left. But none of it is used yet. Hopefully we can change that.

Meanwhile, Google crawls all the references and extracts facts from there. We don't have that available, but there is Linked Open Data.

-- Sebastian

On September 27, 2019 5:26:43 PM GMT+02:00, Gerard Meijssen < gerard.meijssen@gmail.com> wrote:

...
Hoi, I totally reject the assertion was so bad. I have always had the opinion that the main issue was an atrocious user interface. Add to this the people that have Wikipedia notions about quality. They have and had a detrimental effect on both the quantity and quality of Wikidata.

When you add the functionality that is being build by the datawranglers at DBpedia, it becomes easy/easier to compare the data from Wikipedias with Wikidata (and why not Freebase) add what has consensus and curate the differences. This will enable a true datasense of quality and allows us to provide a much improved service. Thanks, GerardM

On Fri, 27 Sep 2019 at 15:54, Marco Fossati fossati@spaziodati.eu wrote:

...
Hey Sebastian,

On 9/20/19 10:22 AM, Sebastian Hellmann wrote:

...
Not much of Freebase did end up in Wikidata.

Dropping here some pointers to shed light on the migration of Freebase to Wikidata, since I was partially involved in the process:

WikiProject [1];

the paper behind [2];

datasets to be migrated [3].

I can confirm that the migration has stalled: as of today, *528 thousands* Freebase statements were curated by the community, out of *10 million* ones. By 'curated', I mean approved or rejected. These numbers come from two queries against the primary sources tool database.

The stall is due to several causes: in my opinion, the most important one was the bad quality of sources [4,5] coming from the Knowledge Vault project [6].

Cheers,

Marco

[1] https://www.wikidata.org/wiki/Wikidata:WikiProject_Freebase [2]

http://static.googleusercontent.com/media/research.google.com/en//pubs/archi... [3]

https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool/Version_1#Data [4]

https://www.wikidata.org/wiki/Wikidata_talk:Primary_sources_tool/Archive/201... [5]

https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Semi-automatic_A... [6] https://www.cs.ubc.ca/~murphyk/Papers/kv-kdd14.pdf

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

-- Sent from my Android device with K-9 Mail. Please excuse my brevity. _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Marco Fossati

2 Oct 2 Oct

3:18 a.m.

New subject: Comparison of Wikidata, DBpedia, and Freebase (draft and invitation)

Hi Denny,

Thanks for publishing your Colab notebook! I went through it and would like to share my first thoughts here. We can then move further discussion somewhere else.

1. in general, how can we compare datasets with totally different time stamps? Wikidata is alive, Freebase is dead, and the latest DBpedia dump is old; 2. given that all datasets contain Wikipedia links, perhaps we could use them as a bridge for the comparison, instead of Wikidata mappings. I'm assuming that Freebase and DBpedia entities with Wikidata mappings are subsets of the whole datasets (but this should be verified); 3. we could use record linkage techniques to connect Wikidata entities with Freebase and DBpedia ones, then assess the agreement in terms of statements per entity. There has been some experimental work (different use case and goal) in the soweego project: https://soweego.readthedocs.io/en/latest/validator.html

On 10/1/19 1:13 AM, Denny Vrandečić wrote:

...

Marco, I totally agree with what you said - the project has stalled, and there is plenty of opportunity to harvest more data from Freebase and bring it to Wikidata, and this should be reignited.

Yeah, that would be great. There is known work to do, but it's hard to sustain such a big project without allocated resources: https://phabricator.wikimedia.org/maniphest/query/CPiqkafGs5G./#R

BTW, there is also version 2 of the Wikidata primary sources tool that needs love, although I'm now skeptical that it will be an effective way to achieve the Freebase harvesting. We should probably rethink the whole thing, and restart small with very simple use cases, pretty much like the Harvest templates tool you mentioned: https://tools.wmflabs.org/pltools/harvesttemplates/

Cheers,

Marco

P.S.: I *might* have found the freshest relevant DBpedia datasets: https://databus.dbpedia.org/dbpedia/mappings/mappingbased-objects I said *might* because it was really painful to find a download button and to guess among multiple versions of the same dataset: https://downloads.dbpedia.org/repo/lts/mappings/mappingbased-objects/2019.09... @Sebastian may know if it's the good one :-)

hellmann＠informatik.uni-leipzig.de

3 Oct 3 Oct

6:42 p.m.

New subject: Comparison of Wikidata, DBpedia, and Freebase (draft and invitation)

Hi Marco,

On October 1, 2019 11:48:02 PM GMT+02:00, Marco Fossati fossati@spaziodati.eu wrote:

...

Hi Denny,

Thanks for publishing your Colab notebook! I went through it and would like to share my first thoughts here. We can then move further discussion somewhere else.

in general, how can we compare datasets with totally different time

stamps? Wikidata is alive, Freebase is dead, and the latest DBpedia dump is old;

DBpedia made monthly releases for the past three months which will continue to improve and grow in an agile Manne, we focused on debugging and integration. Max age would be 30 days. I think that is OK. Denny validated against the live endpoint. This is OK to drive growth, but not reproducible scientifically compared to dumps.

...

given that all datasets contain Wikipedia links, perhaps we could

use them as a bridge for the comparison, instead of Wikidata mappings. I'm assuming that Freebase and DBpedia entities with Wikidata mappings are subsets of the whole datasets (but this should be verified); 3. we could use record linkage techniques to connect Wikidata entities with Freebase and DBpedia ones, then assess the agreement in terms of statements per entity. There has been some experimental work (different

use case and goal) in the soweego project: https://soweego.readthedocs.io/en/latest/validator.html

On 10/1/19 1:13 AM, Denny Vrandečić wrote:

...
Marco, I totally agree with what you said - the project has stalled,

and

...
there is plenty of opportunity to harvest more data from Freebase and

...
bring it to Wikidata, and this should be reignited.

Yeah, that would be great. There is known work to do, but it's hard to sustain such a big project without allocated resources: https://phabricator.wikimedia.org/maniphest/query/CPiqkafGs5G./#R

BTW, there is also version 2 of the Wikidata primary sources tool that needs love, although I'm now skeptical that it will be an effective way

to achieve the Freebase harvesting. We should probably rethink the whole thing, and restart small with very

simple use cases, pretty much like the Harvest templates tool you mentioned: https://tools.wmflabs.org/pltools/harvesttemplates/

Cheers,

Marco

P.S.: I *might* have found the freshest relevant DBpedia datasets: https://databus.dbpedia.org/dbpedia/mappings/mappingbased-objects I said *might* because it was really painful to find a download button and to guess among multiple versions of the same dataset: https://downloads.dbpedia.org/repo/lts/mappings/mappingbased-objects/2019.09... @Sebastian may know if it's the good one :-)

-- Sent from my Android device with K-9 Mail. Please excuse my brevity.

Gerard Meijssen

2 Oct 2 Oct

11:22 a.m.

New subject: Comparison of Wikidata, DBpedia, and Freebase (draft and invitation)

Hoi, As indicated by the DBpedia people, there are two ways in which data gets into their latest Fusion offering. There is consensus, all the available sources agree and, there is the notion where one source is deemed authoritative. Remember, DBpedia uses sources outside of the Wikimedia movement like national libraries !!

What I miss in your paper is purpose, what is the way forward and how does it compare with and improve on current practice. Current practice is that people import data from anywhere, typically it is single sourced if at all and including is introduced human error that is inherent in a manual process. The DBpedia folks have a WMF sponsored project whereby they facilitate the inclusion of data to Wikidata. Particularly where there is consensus (no opposing sources) it is an improvement on current practice, it complements nicely the existing Wikidata content. The content where there is NO consensus, is useful because it enables the highlighting where these errors occur. It will really help in finding false friends.

The Freebase data has been abandoned. It did not get the respect it deserved and particularly at the time its quality was better than Wikidata. The fact that it is dated IS a saving grace because Wikidata/ Wikipedia is particularly strong on the content related to the period of Wikipedia activity. My preferred way of treating the Freebase data is fusing it is the Fusion project. All the data that is new or expands on what is known in Fusion is of relevance. Given that no maintenance is done on the Freebase data, the dissenting data at best can be used for curating what is in the WMF projects.

In your paper you support the notion of harvesting based on single sources. Maybe at a later date. First we need to integrate the uncontroversial data, the data where there is a consensus in multiple projects. The biggest benefit will be that a lot of make work is prevented. Work done because the data just did not get into Wikidata. Thanks, GerardM

On Tue, 1 Oct 2019 at 01:14, Denny Vrandečić vrandecic@google.com wrote:

...

Hi all,

as promised, now that I am back from my trip, here's my draft of the comparison of Wikidata, DBpedia, and Freebase.

It is a draft, it is obviously potentially biased given my background, etc., but I hope that we can work on it together to get it into a good shape.

Markus, amusingly I took pretty much the same example that you went for, the parent predicate. So yes, I was also surprised by the results, and would love to have Sebastian or Kingsley look into it and see if I conducted it fairly.

SJ, Andra, thanks for offering to take a look. I am sure you all can contribute your own unique background and make suggestions on how to improve things and whether the results ring true.

Marco, I totally agree with what you said - the project has stalled, and there is plenty of opportunity to harvest more data from Freebase and bring it to Wikidata, and this should be reignited. Sebastian, I also agree with you, and the numbers do so too, the same is true with the extraction results from DBpedia.

Sebastian, Kingsley, I tried to describe how I understand DBpedia, and all steps should be reproducible. As it seems that the two of you also have to discuss one or the other thing about DBpedia's identity, I am relieved that my confusion is not entirely unjustified. So I tried to use both the last stable DBpedia release as well as a new-style DBpedia fusion dataset for the comparison. But I might have gotten the whole procedure wrong. I am happy to be corrected.

On Sat, Sep 28, 2019 at 12:28 AM hellmann@informatik.uni-leipzig.de wrote:

...
...
Meanwhile, Google crawls all the references and extracts facts from

there. We don't

...
have that available, but there is Linked Open Data.

Potentially, not a bad idea, but we don't do that.

Everyone, this is the first time I share a Colab notebook, and I have no idea if I did it right. So any feedback of the form "oh you didn't switch on that bit over here" or "yes, this works, thank you" is very welcome, because I have no clue what I am doing :) Also, I never did this kind of analysis so transparently, which is kinda both totally cool and rather scary, because now you can all see how dumb I am :)

So everyone is invited to send Pull Requests (I guess that's how this works?), and I would love for us to create a result together that we agree on. I see the result of this exercise to be potentially twofold:

a publication we can point people to who ask about the differences

between Wikidata, DBpedia, and Freebase

to reignite or start projects and processes to reduce these differences

So, here is the link to my Colab notebook:

https://github.com/vrandezo/colabs/blob/master/Comparing_coverage_and_accura...

Ideally, the third goal could be to get to a deeper understanding of how these three projects relate to each other - in my point of view, Freebase is dead and outdated, Wikidata is the core knowledge base that anyone can edit, and DBpedia is the core project to weave value-adding workflows on top of Wikidata or other datasets from the linked open data cloud together. But that's just a proposal.

Cheers, Denny

On Sat, Sep 28, 2019 at 12:28 AM hellmann@informatik.uni-leipzig.de wrote:

...
Hi Gerard,

I was not trying to judge here. I was just saying that it wasn't much data in the end. For me Freebase was basically cherry-picked.

Meanwhile, the data we extract is more pertinent to the goal of having Wikidata cover the info boxes. We still have ~ 500 million statements left. But none of it is used yet. Hopefully we can change that.

Meanwhile, Google crawls all the references and extracts facts from there. We don't have that available, but there is Linked Open Data.

-- Sebastian

On September 27, 2019 5:26:43 PM GMT+02:00, Gerard Meijssen < gerard.meijssen@gmail.com> wrote:

...
Hoi, I totally reject the assertion was so bad. I have always had the opinion that the main issue was an atrocious user interface. Add to this the people that have Wikipedia notions about quality. They have and had a detrimental effect on both the quantity and quality of Wikidata.

When you add the functionality that is being build by the datawranglers at DBpedia, it becomes easy/easier to compare the data from Wikipedias with Wikidata (and why not Freebase) add what has consensus and curate the differences. This will enable a true datasense of quality and allows us to provide a much improved service. Thanks, GerardM

On Fri, 27 Sep 2019 at 15:54, Marco Fossati fossati@spaziodati.eu wrote:

...
Hey Sebastian,

On 9/20/19 10:22 AM, Sebastian Hellmann wrote:

...
Not much of Freebase did end up in Wikidata.

Dropping here some pointers to shed light on the migration of Freebase to Wikidata, since I was partially involved in the process:

WikiProject [1];

the paper behind [2];

datasets to be migrated [3].

I can confirm that the migration has stalled: as of today, *528 thousands* Freebase statements were curated by the community, out of *10 million* ones. By 'curated', I mean approved or rejected. These numbers come from two queries against the primary sources tool database.

The stall is due to several causes: in my opinion, the most important one was the bad quality of sources [4,5] coming from the Knowledge Vault project [6].

Cheers,

Marco

[1] https://www.wikidata.org/wiki/Wikidata:WikiProject_Freebase [2]

http://static.googleusercontent.com/media/research.google.com/en//pubs/archi... [3]

https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool/Version_1#Data [4]

https://www.wikidata.org/wiki/Wikidata_talk:Primary_sources_tool/Archive/201... [5]

https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Semi-automatic_A... [6] https://www.cs.ubc.ca/~murphyk/Papers/kv-kdd14.pdf

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

-- Sent from my Android device with K-9 Mail. Please excuse my brevity. _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Sebastian Hellmann

4 Oct 4 Oct

4:53 a.m.

New subject: Comparison of Wikidata, DBpedia, and Freebase (draft and invitation)

Hi Denny,

here are some initial points:

1. there is also the generic dataset from last month: https://databus.dbpedia.org/dbpedia/generic/infobox-properties/2019.08.30 dataset (We still need to copy the docu on the bus). This has the highest coverage, but lowest consistency. English has around 50k parent properties maybe more if you count child inverse and other variants. We would need to check the mappings at http://mappings.dbpedia.org , which we are doing at the moment anyhow. It could take only an hour to map some healthy chunks into the mappings dataset.

curl https://downloads.dbpedia.org/repo/lts/generic/infobox-properties/2019.08.30... | bzcat | grep "/parent"

http://temporary.dbpedia.org/temporary/parentrel.nt.bz2

Normally this dataset is messy, but still quite useful, because you can write the queries with alternatives (see dbo:position|dbp:position) in a way that make them useable, like this query that works since 13 years:

...

soccer players, who are born in a country with more than 10 million inhabitants, who played as goalkeeper for a club that has a stadium with more than 30.000 seats and the club country is different from the birth country http://dbpedia.org/snorql/?query=SELECT+distinct+%3Fsoccerplayer+%3FcountryOfBirth+%3Fteam+%3FcountryOfTeam+%3Fstadiumcapacity%0D%0A{+%0D%0A%3Fsoccerplayer+a+dbo%3ASoccerPlayer+%3B%0D%0A+++dbo%3Aposition|dbp%3Aposition+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FGoalkeeper_%28association_football%29%3E+%3B%0D%0A+++dbo%3AbirthPlace%2Fdbo%3Acountry*+%3FcountryOfBirth+%3B%0D%0A+++%23dbo%3Anumber+13+%3B%0D%0A+++dbo%3Ateam+%3Fteam+.%0D%0A+++%3Fteam+dbo%3Acapacity+%3Fstadiumcapacity+%3B+dbo%3Aground+%3FcountryOfTeam+.+%0D%0A+++%3FcountryOfBirth+a+dbo%3ACountry+%3B+dbo%3ApopulationTotal+%3Fpopulation+.%0D%0A+++%3FcountryOfTeam+a+dbo%3ACountry+.%0D%0AFILTER+%28%3FcountryOfTeam+!%3D+%3FcountryOfBirth%29%0D%0AFILTER+%28%3Fstadiumcapacity+%3E+30000%29%0D%0AFILTER+%28%3Fpopulation+%3E+10000000%29%0D%0A}+order+by+%3Fsoccerplayer

Maybe, we could also evaluate some queries which can be answered by one or the other? Can you do the query above in Wikidata?

2. We also have an API to get all references from infoboxes now as a partial result of the GFS project . See point 5 here : https://meta.wikimedia.org/wiki/Grants:Project/DBpedia/GlobalFactSyncRE

3. This particular dataset (generic/infobox-properties) above is also a good measure of non-adoption of Wikidata in Wikipedia. In total, it has over 500 million statements for all languages. Having a statement here means, that the data is using an infobox template parameter and no wikidata is used. The dataset is still extracted in the same way. We can check whether it got bigger or smaller. It is the same algorithm. But the fact that this still works and has a decent size indicates that Wikidata adoption by Wikipedians is low.

4. I need to look at the parent example in detail. However, I have to say that the property lends itself well for the Wikidata approach since it is easily understood and has sort of a truthiness and is easy to research and add.

I am not sure if it is representative as e.g. "employer" is more difficult to model (time scoped). Like my data here is outdated: https://www.wikidata.org/wiki/Q39429171

Also I don't see yet how this will become a more systematic approach that shows where to optimize, but I still need to read it fully.

We can start with this one however.

-- Sebastian

On 01.10.19 01:13, Denny Vrandečić wrote:

...

Hi all,

as promised, now that I am back from my trip, here's my draft of the comparison of Wikidata, DBpedia, and Freebase.

It is a draft, it is obviously potentially biased given my background, etc., but I hope that we can work on it together to get it into a good shape.

Markus, amusingly I took pretty much the same example that you went for, the parent predicate. So yes, I was also surprised by the results, and would love to have Sebastian or Kingsley look into it and see if I conducted it fairly.

SJ, Andra, thanks for offering to take a look. I am sure you all can contribute your own unique background and make suggestions on how to improve things and whether the results ring true.

Marco, I totally agree with what you said - the project has stalled, and there is plenty of opportunity to harvest more data from Freebase and bring it to Wikidata, and this should be reignited. Sebastian, I also agree with you, and the numbers do so too, the same is true with the extraction results from DBpedia.

Sebastian, Kingsley, I tried to describe how I understand DBpedia, and all steps should be reproducible. As it seems that the two of you also have to discuss one or the other thing about DBpedia's identity, I am relieved that my confusion is not entirely unjustified. So I tried to use both the last stable DBpedia release as well as a new-style DBpedia fusion dataset for the comparison. But I might have gotten the whole procedure wrong. I am happy to be corrected.

On Sat, Sep 28, 2019 at 12:28 AM <hellmann@informatik.uni-leipzig.de mailto:hellmann@informatik.uni-leipzig.de> wrote:

...
Meanwhile, Google crawls all the references and extracts facts from

there. We don't

...
have that available, but there is Linked Open Data.

Potentially, not a bad idea, but we don't do that.

Everyone, this is the first time I share a Colab notebook, and I have no idea if I did it right. So any feedback of the form "oh you didn't switch on that bit over here" or "yes, this works, thank you" is very welcome, because I have no clue what I am doing :) Also, I never did this kind of analysis so transparently, which is kinda both totally cool and rather scary, because now you can all see how dumb I am :)

So everyone is invited to send Pull Requests (I guess that's how this works?), and I would love for us to create a result together that we agree on. I see the result of this exercise to be potentially twofold:

a publication we can point people to who ask about the differences

between Wikidata, DBpedia, and Freebase

to reignite or start projects and processes to reduce these differences

So, here is the link to my Colab notebook:

https://github.com/vrandezo/colabs/blob/master/Comparing_coverage_and_accura...

Ideally, the third goal could be to get to a deeper understanding of how these three projects relate to each other - in my point of view, Freebase is dead and outdated, Wikidata is the core knowledge base that anyone can edit, and DBpedia is the core project to weave value-adding workflows on top of Wikidata or other datasets from the linked open data cloud together. But that's just a proposal.

Cheers, Denny

On Sat, Sep 28, 2019 at 12:28 AM <hellmann@informatik.uni-leipzig.de mailto:hellmann@informatik.uni-leipzig.de> wrote:
Hi Gerard,

I was not trying to judge here. I was just saying that it wasn't
much data in the end.
For me Freebase was basically cherry-picked.

Meanwhile, the data we extract is more pertinent to the goal of
having Wikidata cover the info boxes. We still have ~ 500 million
statements left. But none of it is used yet. Hopefully we can
change that.

Meanwhile, Google crawls all the references and extracts facts
from there. We don't have that available, but there is Linked Open
Data.

--
Sebastian

On September 27, 2019 5:26:43 PM GMT+02:00, Gerard Meijssen
<gerard.meijssen@gmail.com <mailto:gerard.meijssen@gmail.com>> wrote:

    Hoi,
    I totally reject the assertion was so bad. I have always had
    the opinion that the main issue was an atrocious user
    interface. Add to this the people that have Wikipedia notions
    about quality. They have and had a detrimental effect on both
    the quantity and quality of Wikidata.

    When you add the functionality that is being build by the
    datawranglers at DBpedia, it becomes easy/easier to compare
    the data from Wikipedias with Wikidata (and why not Freebase)
    add what has consensus and curate the differences. This will
    enable a true datasense of quality and allows us to provide a
    much improved service.
    Thanks,
          GerardM

    On Fri, 27 Sep 2019 at 15:54, Marco Fossati
    <fossati@spaziodati.eu <mailto:fossati@spaziodati.eu>> wrote:

        Hey Sebastian,

        On 9/20/19 10:22 AM, Sebastian Hellmann wrote:
        > Not much of Freebase did end up in Wikidata.

        Dropping here some pointers to shed light on the migration
        of Freebase
        to Wikidata, since I was partially involved in the process:
        1. WikiProject [1];
        2. the paper behind [2];
        3. datasets to be migrated [3].

        I can confirm that the migration has stalled: as of today,
        *528
        thousands* Freebase statements were curated by the
        community, out of *10
        million* ones. By 'curated', I mean approved or rejected.
        These numbers come from two queries against the primary
        sources tool
        database.

        The stall is due to several causes: in my opinion, the
        most important
        one was the bad quality of sources [4,5] coming from the
        Knowledge Vault
        project [6].

        Cheers,

        Marco

        [1]
        https://www.wikidata.org/wiki/Wikidata:WikiProject_Freebase
        [2]
        http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/44818.pdf
        [3]
        https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool/Version_1#Data
        [4]
        https://www.wikidata.org/wiki/Wikidata_talk:Primary_sources_tool/Archive/2017#Quality_of_sources
        [5]
        https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Semi-automatic_Addition_of_References_to_Wikidata_Statements#A_whitelist_for_sources
        [6] https://www.cs.ubc.ca/~murphyk/Papers/kv-kdd14.pdf

        _______________________________________________
        Wikidata mailing list
        Wikidata@lists.wikimedia.org
        <mailto:Wikidata@lists.wikimedia.org>
        https://lists.wikimedia.org/mailman/listinfo/wikidata


-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Sebastian Hellmann

15 Nov 15 Nov

9:48 p.m.

New subject: Comparison of Wikidata, DBpedia, and Freebase (draft and invitation)

Hi Denny, all,

here is the second prototype of the new overarching DBpedia approach:

https://databus.dbpedia.org/vehnem/flexifusion/prefusion/2019.11.01

Datasets are grouped by property, DBpedia ontology is used, if exists. Data contains all Wkipedia languages mapped via DBpedia, Wikidata where mapped, some properties from DNB, Musicbrainz, Geonames.

We normalized the subjects based on the sameas links with some quality control. Datatypes will be normalised by rules plus machine learning in the future.

As soon as we make some adjustments, we can load it into the GFS GUI.

We are also working on an export using Wikidata Q's and P's so it is easier to ingest into Wikidata. More datasets from LOD will follow.

All the best,

Sebastian

On 04.10.19 01:23, Sebastian Hellmann wrote:

...

Hi Denny,

here are some initial points:

there is also the generic dataset from last month:

https://databus.dbpedia.org/dbpedia/generic/infobox-properties/2019.08.30 dataset (We still need to copy the docu on the bus). This has the highest coverage, but lowest consistency. English has around 50k parent properties maybe more if you count child inverse and other variants. We would need to check the mappings at http://mappings.dbpedia.org , which we are doing at the moment anyhow. It could take only an hour to map some healthy chunks into the mappings dataset.

curl https://downloads.dbpedia.org/repo/lts/generic/infobox-properties/2019.08.30... | bzcat | grep "/parent"

http://temporary.dbpedia.org/temporary/parentrel.nt.bz2

Normally this dataset is messy, but still quite useful, because you can write the queries with alternatives (see dbo:position|dbp:position) in a way that make them useable, like this query that works since 13 years:

...
soccer players, who are born in a country with more than 10 million inhabitants, who played as goalkeeper for a club that has a stadium with more than 30.000 seats and the club country is different from the birth country http://dbpedia.org/snorql/?query=SELECT+distinct+%3Fsoccerplayer+%3FcountryOfBirth+%3Fteam+%3FcountryOfTeam+%3Fstadiumcapacity%0D%0A{+%0D%0A%3Fsoccerplayer+a+dbo%3ASoccerPlayer+%3B%0D%0A+++dbo%3Aposition|dbp%3Aposition+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FGoalkeeper_%28association_football%29%3E+%3B%0D%0A+++dbo%3AbirthPlace%2Fdbo%3Acountry*+%3FcountryOfBirth+%3B%0D%0A+++%23dbo%3Anumber+13+%3B%0D%0A+++dbo%3Ateam+%3Fteam+.%0D%0A+++%3Fteam+dbo%3Acapacity+%3Fstadiumcapacity+%3B+dbo%3Aground+%3FcountryOfTeam+.+%0D%0A+++%3FcountryOfBirth+a+dbo%3ACountry+%3B+dbo%3ApopulationTotal+%3Fpopulation+.%0D%0A+++%3FcountryOfTeam+a+dbo%3ACountry+.%0D%0AFILTER+%28%3FcountryOfTeam+!%3D+%3FcountryOfBirth%29%0D%0AFILTER+%28%3Fstadiumcapacity+%3E+30000%29%0D%0AFILTER+%28%3Fpopulation+%3E+10000000%29%0D%0A}+order+by+%3Fsoccerplayer

Maybe, we could also evaluate some queries which can be answered by one or the other? Can you do the query above in Wikidata?

We also have an API to get all references from infoboxes now as a

partial result of the GFS project . See point 5 here : https://meta.wikimedia.org/wiki/Grants:Project/DBpedia/GlobalFactSyncRE

This particular dataset (generic/infobox-properties) above is also

a good measure of non-adoption of Wikidata in Wikipedia. In total, it has over 500 million statements for all languages. Having a statement here means, that the data is using an infobox template parameter and no wikidata is used. The dataset is still extracted in the same way. We can check whether it got bigger or smaller. It is the same algorithm. But the fact that this still works and has a decent size indicates that Wikidata adoption by Wikipedians is low.

I need to look at the parent example in detail. However, I have to

say that the property lends itself well for the Wikidata approach since it is easily understood and has sort of a truthiness and is easy to research and add.

I am not sure if it is representative as e.g. "employer" is more difficult to model (time scoped). Like my data here is outdated: https://www.wikidata.org/wiki/Q39429171

Also I don't see yet how this will become a more systematic approach that shows where to optimize, but I still need to read it fully.

We can start with this one however.

-- Sebastian

On 01.10.19 01:13, Denny Vrandečić wrote:

...
Hi all,

as promised, now that I am back from my trip, here's my draft of the comparison of Wikidata, DBpedia, and Freebase.

It is a draft, it is obviously potentially biased given my background, etc., but I hope that we can work on it together to get it into a good shape.

Markus, amusingly I took pretty much the same example that you went for, the parent predicate. So yes, I was also surprised by the results, and would love to have Sebastian or Kingsley look into it and see if I conducted it fairly.

SJ, Andra, thanks for offering to take a look. I am sure you all can contribute your own unique background and make suggestions on how to improve things and whether the results ring true.

Marco, I totally agree with what you said - the project has stalled, and there is plenty of opportunity to harvest more data from Freebase and bring it to Wikidata, and this should be reignited. Sebastian, I also agree with you, and the numbers do so too, the same is true with the extraction results from DBpedia.

Sebastian, Kingsley, I tried to describe how I understand DBpedia, and all steps should be reproducible. As it seems that the two of you also have to discuss one or the other thing about DBpedia's identity, I am relieved that my confusion is not entirely unjustified. So I tried to use both the last stable DBpedia release as well as a new-style DBpedia fusion dataset for the comparison. But I might have gotten the whole procedure wrong. I am happy to be corrected.

On Sat, Sep 28, 2019 at 12:28 AM <hellmann@informatik.uni-leipzig.de mailto:hellmann@informatik.uni-leipzig.de> wrote:

...
Meanwhile, Google crawls all the references and extracts facts from

there. We don't

...
have that available, but there is Linked Open Data.

Potentially, not a bad idea, but we don't do that.

Everyone, this is the first time I share a Colab notebook, and I have no idea if I did it right. So any feedback of the form "oh you didn't switch on that bit over here" or "yes, this works, thank you" is very welcome, because I have no clue what I am doing :) Also, I never did this kind of analysis so transparently, which is kinda both totally cool and rather scary, because now you can all see how dumb I am :)

So everyone is invited to send Pull Requests (I guess that's how this works?), and I would love for us to create a result together that we agree on. I see the result of this exercise to be potentially twofold:

a publication we can point people to who ask about the differences

between Wikidata, DBpedia, and Freebase

to reignite or start projects and processes to reduce these

differences

So, here is the link to my Colab notebook:

https://github.com/vrandezo/colabs/blob/master/Comparing_coverage_and_accura...

Ideally, the third goal could be to get to a deeper understanding of how these three projects relate to each other - in my point of view, Freebase is dead and outdated, Wikidata is the core knowledge base that anyone can edit, and DBpedia is the core project to weave value-adding workflows on top of Wikidata or other datasets from the linked open data cloud together. But that's just a proposal.

Cheers, Denny

On Sat, Sep 28, 2019 at 12:28 AM <hellmann@informatik.uni-leipzig.de mailto:hellmann@informatik.uni-leipzig.de> wrote:
Hi Gerard,

I was not trying to judge here. I was just saying that it wasn't
much data in the end.
For me Freebase was basically cherry-picked.

Meanwhile, the data we extract is more pertinent to the goal of
having Wikidata cover the info boxes. We still have ~ 500 million
statements left. But none of it is used yet. Hopefully we can
change that.

Meanwhile, Google crawls all the references and extracts facts
from there. We don't have that available, but there is Linked
Open Data.

--
Sebastian

On September 27, 2019 5:26:43 PM GMT+02:00, Gerard Meijssen
<gerard.meijssen@gmail.com <mailto:gerard.meijssen@gmail.com>>
wrote:

    Hoi,
    I totally reject the assertion was so bad. I have always had
    the opinion that the main issue was an atrocious user
    interface. Add to this the people that have Wikipedia notions
    about quality. They have and had a detrimental effect on both
    the quantity and quality of Wikidata.

    When you add the functionality that is being build by the
    datawranglers at DBpedia, it becomes easy/easier to compare
    the data from Wikipedias with Wikidata (and why not Freebase)
    add what has consensus and curate the differences. This will
    enable a true datasense of quality and allows us to provide a
    much improved service.
    Thanks,
          GerardM

    On Fri, 27 Sep 2019 at 15:54, Marco Fossati
    <fossati@spaziodati.eu <mailto:fossati@spaziodati.eu>> wrote:

        Hey Sebastian,

        On 9/20/19 10:22 AM, Sebastian Hellmann wrote:
        > Not much of Freebase did end up in Wikidata.

        Dropping here some pointers to shed light on the
        migration of Freebase
        to Wikidata, since I was partially involved in the process:
        1. WikiProject [1];
        2. the paper behind [2];
        3. datasets to be migrated [3].

        I can confirm that the migration has stalled: as of
        today, *528
        thousands* Freebase statements were curated by the
        community, out of *10
        million* ones. By 'curated', I mean approved or rejected.
        These numbers come from two queries against the primary
        sources tool
        database.

        The stall is due to several causes: in my opinion, the
        most important
        one was the bad quality of sources [4,5] coming from the
        Knowledge Vault
        project [6].

        Cheers,

        Marco

        [1]
        https://www.wikidata.org/wiki/Wikidata:WikiProject_Freebase
        [2]
        http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/44818.pdf
        [3]
        https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool/Version_1#Data
        [4]
        https://www.wikidata.org/wiki/Wikidata_talk:Primary_sources_tool/Archive/2017#Quality_of_sources
        [5]
        https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Semi-automatic_Addition_of_References_to_Wikidata_Statements#A_whitelist_for_sources
        [6] https://www.cs.ubc.ca/~murphyk/Papers/kv-kdd14.pdf

        _______________________________________________
        Wikidata mailing list
        Wikidata@lists.wikimedia.org
        <mailto:Wikidata@lists.wikimedia.org>
        https://lists.wikimedia.org/mailman/listinfo/wikidata


-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
-- All the best, Sebastian Hellmann

Director of Knowledge Integration and Linked Data Technologies (KILT) Competence Center at the Institute for Applied Informatics (InfAI) at Leipzig University Executive Director of the DBpedia Association Projects: http://dbpedia.org, http://nlp2rdf.org, http://linguistics.okfn.org, https://www.w3.org/community/ld4lt http://www.w3.org/community/ld4lt Homepage: http://aksw.org/SebastianHellmann Research Group: http://aksw.org

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

1866

Age (days ago)

1922

Last active (days ago)

wikidata@lists.wikimedia.org

35 comments

14 participants

tags (0)

participants (14)

Andra Waagmeester
Denny Vrandečić
Denny Vrandečić
Federico Leva (Nemo)
Gerard Meijssen
hellmann＠informatik.uni-leipzig.de
Kingsley Idehen
Luca Martinelli
Marco Fossati
Markus Kroetzsch
Nicolas VIGNERON
Samuel Klein
Sebastian Hellmann
Thad Guidry