Dear all,
personally I am quite happy that Denny can contribute more to Wikidata and Wikipedia. No personal criticism there, I read his thesis and I am impressed by his work and contributions.
I don't want to facilitate any conspiracy theories here, but I am wondering about where Wikidata is going, especially with respect to Google.
Note that Chrome/Chromium being Open Source with a twist has already pushed Firefox from the market, but now there is this controversy about what is being tracked server side by Google Analytics and Client side by cookies and also the current discussion about Ad Blocker removal from Chrome: https://www.wired.com/story/google-chrome-ad-blockers-extensions-api/
Maybe somebody could enlighten me about the overall strategy and connections here.
1. there was a Knowledge Engine Project which failed, but in principle had the right idea: https://en.wikipedia.org/wiki/Knowledge_Engine_(Wikimedia_Foundation)
This was aimed to "democratize the discovery of media, news and information", in particular counter-moving the traffic sink by Google providing Wikipedia's information in Google Search. Now that there is Wikidata, this is much better for Google because they can take the CC-0 data as they wish.
2. there are some very widely used terms like "Knowledge Graph" , which seems to be blocked by Google: https://www.wikidata.org/wiki/Q648625 and https://en.wikipedia.org/wiki/Knowledge_Graph without a neutral point of view like the German WP adopted: https://de.wikipedia.org/wiki/Google#Knowledge_Graph
3. I was under the impression that Google bought Freebase and then started Wikidata as a non-threatening model to the data they have in their Knowledge Graph
Could someone give me some pointers about the financial connections of Google and Wikimedia (this should be transparent, right?) and also who pushed the Wikidata movement into life in 2012?
Google was also mentioned in https://blog.wikimedia.org/2017/10/30/wikidata-fifth-birthday/ but while it reads "Freebase https://en.wikipedia.org/wiki/Freebase, was discontinued because of the superiority of Wikidata’s approach and active community." I know the story as: Google didn't want its competitors to have the data and the service. Not much of Freebase did end up in Wikidata.
As I said, I don't want to push any opinions in any directions. I am more asking for more information about the connection of Google to Wikidata (financially), then Google to WMF and also I am asking about any strategic advantages for Google in relation to their competition.
Please don't answer with "How great Wikidata is", I already know that and this is also not in the scope of my "How intertwined is Google with Wikidata / WMF?" question. Can't mention this enough: also not against Denny.
It is a request for better information as I can't seem to find clear answers here.
Hi,
You can already found some information here: https://en.wikipedia.org/wiki/Wikidata#Development_history (including finance details is you follow the sources).
For the "How intertwined is Google", it's a long and complex story, it goes back at least to 2005 (Wikipedia probably wouldn't exist today - or in a drastic different way - if the search engine didn't favour Wikipedia since then). As a non-answer, I would say that Wikidata is as intertwined with Google as any major website is intertwined with Google.
Cdlt, ~nicolas
Le ven. 20 sept. 2019 à 10:48, Sebastian Hellmann < hellmann@informatik.uni-leipzig.de> a écrit :
Dear all,
personally I am quite happy that Denny can contribute more to Wikidata and Wikipedia. No personal criticism there, I read his thesis and I am impressed by his work and contributions.
I don't want to facilitate any conspiracy theories here, but I am wondering about where Wikidata is going, especially with respect to Google.
Note that Chrome/Chromium being Open Source with a twist has already pushed Firefox from the market, but now there is this controversy about what is being tracked server side by Google Analytics and Client side by cookies and also the current discussion about Ad Blocker removal from Chrome: https://www.wired.com/story/google-chrome-ad-blockers-extensions-api/
Maybe somebody could enlighten me about the overall strategy and connections here.
- there was a Knowledge Engine Project which failed, but in principle had
the right idea: https://en.wikipedia.org/wiki/Knowledge_Engine_(Wikimedia_Foundation)
This was aimed to "democratize the discovery of media, news and information", in particular counter-moving the traffic sink by Google providing Wikipedia's information in Google Search. Now that there is Wikidata, this is much better for Google because they can take the CC-0 data as they wish.
- there are some very widely used terms like "Knowledge Graph" , which
seems to be blocked by Google: https://www.wikidata.org/wiki/Q648625 and https://en.wikipedia.org/wiki/Knowledge_Graph without a neutral point of view like the German WP adopted: https://de.wikipedia.org/wiki/Google#Knowledge_Graph
- I was under the impression that Google bought Freebase and then started
Wikidata as a non-threatening model to the data they have in their Knowledge Graph
Could someone give me some pointers about the financial connections of Google and Wikimedia (this should be transparent, right?) and also who pushed the Wikidata movement into life in 2012?
Google was also mentioned in https://blog.wikimedia.org/2017/10/30/wikidata-fifth-birthday/ but while it reads "Freebase https://en.wikipedia.org/wiki/Freebase, was discontinued because of the superiority of Wikidata’s approach and active community." I know the story as: Google didn't want its competitors to have the data and the service. Not much of Freebase did end up in Wikidata.
As I said, I don't want to push any opinions in any directions. I am more asking for more information about the connection of Google to Wikidata (financially), then Google to WMF and also I am asking about any strategic advantages for Google in relation to their competition.
Please don't answer with "How great Wikidata is", I already know that and this is also not in the scope of my "How intertwined is Google with Wikidata / WMF?" question. Can't mention this enough: also not against Denny. It is a request for better information as I can't seem to find clear answers here.
-- All the best, Sebastian Hellmann
Director of Knowledge Integration and Linked Data Technologies (KILT) Competence Center at the Institute for Applied Informatics (InfAI) at Leipzig University Executive Director of the DBpedia Association Projects: http://dbpedia.org, http://nlp2rdf.org, http://linguistics.okfn.org, https://www.w3.org/community/ld4lt http://www.w3.org/community/ld4lt Homepage: http://aksw.org/SebastianHellmann Research Group: http://aksw.org _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Sebastian Hellmann, 20/09/19 11:22:
Maybe somebody could enlighten me about the overall strategy and connections here.
You can add more links to grants and other Wikimedia pages on https://meta.wikimedia.org/wiki/Google.
Google and the Wikimedia movement are on opposite sides for most things, but occasionally some of their employees (or algorithms!) happen to be interested in the same things as us, so we end up doing things together and a few breadcrumbs travel towards WMF. What matters to me is that they don't abuse our brands.
Sadly WMF is not always careful about communication, for instance https://wikimediafoundation.org/our-work/ still has an appalling sentence "Working with partners like Google" right under the heading "Partner for change".
Federico
Hi Sebastian,
I'll try to take on some of your doubts, hopefully helping you to solve them, or at least to give you some starting points.
Il giorno ven 20 set 2019 alle ore 10:48 Sebastian Hellmann hellmann@informatik.uni-leipzig.de ha scritto:
- there was a Knowledge Engine Project which failed, but in principle had the right idea: https://en.wikipedia.org/wiki/Knowledge_Engine_(Wikimedia_Foundation)
This was aimed to "democratize the discovery of media, news and information", in particular counter-moving the traffic sink by Google providing Wikipedia's information in Google Search.
I don't remember/know much about the Knowledge Engine (KE), but to quote Liam Wyatt/User:Wittylama, "the crime wasn't thinking about it, it was the cover-up".
In other words, and based on what I remember and know, the Wikipedia internal search engine always sucked, and KE was an hypothesis of solving this problem. The main problems were: 1) an overall sensation - I repeat: SENSATION - that WMF was ready to compete with Google on the "search engine market", something that was never discussed within and/or with the community; 2) that this project was pushed in a very "secretive" way, i.e. it was discovered by chance with an announcement of WMF winning a grant from [I don't remember which institution, sorry], and the more questions were raised about it, the less answers the then-Executive Director seemed to be willing to give.
IMHO, having an internal engine that helps people getting what they're looking for is a great idea, and the way it was conducted was indeed a crime, because (again IMHO) we lost a good opportunity to start our work several years in advance. What makes me still angry about it was the way the whole thing was conducted: we still lack most pieces of the whole thing, and this may fuel non-NPOV reconstructions as well as unnecessary spin-off discussions that bring us further away from the solution we were trying to achieve.
Now that there is Wikidata, this is much better for Google because they can take the CC-0 data as they wish.
KE and Wikidata are two separate issues. I'm sure Wikidata would have played a role in KE, given its important role in linking concepts and items, but they're still two separate things.
As for Google picking data from Wikidata, they do the same from countless databases (disregarding of their license), so all I can say is that, if I were Google, I'd do the very same thing. The difference between Google and Wikidata, and the reason why I still think Wikidata is better, is that the latter releases its data to *everybody*, while the former keeps it only to itself.
And I want to stress that "everybody" part: when we do synchronisation with a GLAM database, we give them back an extremely valuable feedback, in terms of link to other databases they can freely access, as well as in terms of hints for data clean-up - which, again, is something that Google doesn't provide at all.
- I was under the impression that Google bought Freebase and then started Wikidata as a non-threatening model to the data they have in their Knowledge Graph
Could someone give me some pointers about the financial connections of Google and Wikimedia (this should be transparent, right?) and also who pushed the Wikidata movement into life in 2012?
Wikidata started as an independent project by some of the people who worked on Semantic MediaWiki (there are so many of them I fear I might miss some of them, and that would be embarrassing for me), not as a Google project.
It was originally financed *also* by Google, yes, but it was a small part compared to the aid from other institutions, such as the Allen Institute for Artificial Intelligence, the Gordon and Betty Moore Foundation, the Wikimedia Foundation itself, and others.
Google was also mentioned in https://blog.wikimedia.org/2017/10/30/wikidata-fifth-birthday/ but while it reads "Freebase, was discontinued because of the superiority of Wikidata’s approach and active community." I know the story as: Google didn't want its competitors to have the data and the service. Not much of Freebase did end up in Wikidata.
I remember the story as "Google couldn't make anymore money out of Freebase, that was being also superseded by other internal systems *and* Wikidata, so Denny pushed Google to donate Freebase's triples to Wikidata".
This is basically the same (well, with due proportions) that happened with OpenRefine, which originally was called Google Refine and that was discontinued because Google couldn't do any profit with it, and now is one of the most valuable tools that we can use to clean up and re-conciliate data with Wikidata.
As for the integration of the data, I don't have any precise data about it, but I'm sure that a fair part of Freebase did end up in Wikidata, just as much as many other big databases did.
As I said, I don't want to push any opinions in any directions. I am more asking for more information about the connection of Google to Wikidata (financially), then Google to WMF and also I am asking about any strategic advantages for Google in relation to their competition.
I cannot properly answer you about this. WMF and Google are in my view "frenemies": Google is, and will always be, a Big Tech company and WMF is, and will always be, a champion of free knowledge. You just can't do free knowledge by forcing Big Tech companies to NOT pick up your tools and data, though, as much I as think it'd be unnecessary for us just to NOT take any help from Google, if we can work together on several objectives. This is ok to me, as much as we keep being transparent on this - which I recognise to be your point and your motivation beneath your email, so don't worry about it. ;)
I hope I helped you in wrapping your head about the whole thing. :)
Cheers,
With my tech evangelist hat on...
Google's philanthropy is nearly boundless when it comes to the promotion of knowledge. Why? Because indeed it's in their best interest otherwise no one can prosper without knowledge. They aggregate knowledge for the benefit of mankind, and then make a profit through advertising ... all while making that knowledge extremely easy to be found for the world.
Nothing in this world is entirely free (servers must spin, cooling must be provided, bugs squashed...). To that end, Google and others understand this and help defray substantial costs of providing free knowledge in multiple domains especially in those domains that contribute to tech and human goodwill (science & medicine). Sometimes with direct cash donations to WMF, even just this year with $2 million being decided by Google employees to give to WMF!!! <https://techcrunch.com/2019/01/22/google-org-donates-2-million-to-wikipedias...
Other times it's with talent from interns they pay for during the summer, or tech knowledge exchanges to help tackle problems we have. Still other times it's just their 20% employee time helping the world keep Open Source libraries up to date or giving the world Open Source tools that we ourselves use across WMF every minute of the day. Then there are all the trickle down benefits (increasing privacy, REALLY?, yes Really! https://opensource.googleblog.com/2019/09/enabling-developers-and-organizations.html, reducing security risks, better performance, etc.) from those Open Source libraries & tools with things like ClusterFuzz https://opensource.googleblog.com/2019/02/open-sourcing-clusterfuzz.html, TensorFlow, Go, Kubernetes, and 1000's of others. https://opensource.google.com/ https://github.com/google
Thad https://www.linkedin.com/in/thadguidry/
On Fri, Sep 20, 2019 at 5:25 AM Luca Martinelli martinelliluca@gmail.com wrote:
Hi Sebastian,
I'll try to take on some of your doubts, hopefully helping you to solve them, or at least to give you some starting points.
Il giorno ven 20 set 2019 alle ore 10:48 Sebastian Hellmann hellmann@informatik.uni-leipzig.de ha scritto:
- there was a Knowledge Engine Project which failed, but in principle
had the right idea: https://en.wikipedia.org/wiki/Knowledge_Engine_(Wikimedia_Foundation)
This was aimed to "democratize the discovery of media, news and
information", in particular counter-moving the traffic sink by Google providing Wikipedia's information in Google Search.
I don't remember/know much about the Knowledge Engine (KE), but to quote Liam Wyatt/User:Wittylama, "the crime wasn't thinking about it, it was the cover-up".
In other words, and based on what I remember and know, the Wikipedia internal search engine always sucked, and KE was an hypothesis of solving this problem. The main problems were:
- an overall sensation - I repeat: SENSATION - that WMF was ready to
compete with Google on the "search engine market", something that was never discussed within and/or with the community; 2) that this project was pushed in a very "secretive" way, i.e. it was discovered by chance with an announcement of WMF winning a grant from [I don't remember which institution, sorry], and the more questions were raised about it, the less answers the then-Executive Director seemed to be willing to give.
IMHO, having an internal engine that helps people getting what they're looking for is a great idea, and the way it was conducted was indeed a crime, because (again IMHO) we lost a good opportunity to start our work several years in advance. What makes me still angry about it was the way the whole thing was conducted: we still lack most pieces of the whole thing, and this may fuel non-NPOV reconstructions as well as unnecessary spin-off discussions that bring us further away from the solution we were trying to achieve.
Now that there is Wikidata, this is much better for Google because they
can take the CC-0 data as they wish.
KE and Wikidata are two separate issues. I'm sure Wikidata would have played a role in KE, given its important role in linking concepts and items, but they're still two separate things.
As for Google picking data from Wikidata, they do the same from countless databases (disregarding of their license), so all I can say is that, if I were Google, I'd do the very same thing. The difference between Google and Wikidata, and the reason why I still think Wikidata is better, is that the latter releases its data to *everybody*, while the former keeps it only to itself.
And I want to stress that "everybody" part: when we do synchronisation with a GLAM database, we give them back an extremely valuable feedback, in terms of link to other databases they can freely access, as well as in terms of hints for data clean-up - which, again, is something that Google doesn't provide at all.
- I was under the impression that Google bought Freebase and then
started Wikidata as a non-threatening model to the data they have in their Knowledge Graph
Could someone give me some pointers about the financial connections of
Google and Wikimedia (this should be transparent, right?) and also who pushed the Wikidata movement into life in 2012?
Wikidata started as an independent project by some of the people who worked on Semantic MediaWiki (there are so many of them I fear I might miss some of them, and that would be embarrassing for me), not as a Google project.
It was originally financed *also* by Google, yes, but it was a small part compared to the aid from other institutions, such as the Allen Institute for Artificial Intelligence, the Gordon and Betty Moore Foundation, the Wikimedia Foundation itself, and others.
Google was also mentioned in
https://blog.wikimedia.org/2017/10/30/wikidata-fifth-birthday/ but while it reads "Freebase, was discontinued because of the superiority of Wikidata’s approach and active community." I know the story as: Google didn't want its competitors to have the data and the service. Not much of Freebase did end up in Wikidata.
I remember the story as "Google couldn't make anymore money out of Freebase, that was being also superseded by other internal systems *and* Wikidata, so Denny pushed Google to donate Freebase's triples to Wikidata".
This is basically the same (well, with due proportions) that happened with OpenRefine, which originally was called Google Refine and that was discontinued because Google couldn't do any profit with it, and now is one of the most valuable tools that we can use to clean up and re-conciliate data with Wikidata.
As for the integration of the data, I don't have any precise data about it, but I'm sure that a fair part of Freebase did end up in Wikidata, just as much as many other big databases did.
As I said, I don't want to push any opinions in any directions. I am
more asking for more information about the connection of Google to Wikidata (financially), then Google to WMF and also I am asking about any strategic advantages for Google in relation to their competition.
I cannot properly answer you about this. WMF and Google are in my view "frenemies": Google is, and will always be, a Big Tech company and WMF is, and will always be, a champion of free knowledge. You just can't do free knowledge by forcing Big Tech companies to NOT pick up your tools and data, though, as much I as think it'd be unnecessary for us just to NOT take any help from Google, if we can work together on several objectives. This is ok to me, as much as we keep being transparent on this - which I recognise to be your point and your motivation beneath your email, so don't worry about it. ;)
I hope I helped you in wrapping your head about the whole thing. :)
Cheers,
-- Luca "Sannita" Martinelli http://it.wikipedia.org/wiki/Utente:Sannita
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hi Thad,
On 20.09.19 15:28, Thad Guidry wrote:
With my tech evangelist hat on...
Google's philanthropy is nearly boundless when it comes to the promotion of knowledge. Why? Because indeed it's in their best interest otherwise no one can prosper without knowledge. They aggregate knowledge for the benefit of mankind, and then make a profit through advertising ... all while making that knowledge extremely easy to be found for the world.
I am neither pro-Google or anti-Google per se. Maybe skeptical and interested in what is the truth behind the truth. Google is not synonym to philanthropy. Wikimedia is or at least I think they are doing many things right. Google is a platform, so primarily they "aggregate knowledge for their benefit" while creating enough incentives in form of accessibility for users to add the user's knowledge to theirs. It is not about what Google offers, but what it takes in return. 20% of employees time is also an investment in the skill of the employee, a Google asset called Human Capital and also leads to me and Denny from Google discussing whether https://en.wikipedia.org/wiki/Talk:Knowledge_Graph is content marketing or knowledge (@Denny: no offense, legit arguments, but no agenda to resolve the stalled discussion there). Except I don't have 20% time to straighten the view into what I believe would be neutral, so pushing it becomes a resource issue.
I found the other replies much more realistic and the perspective is yet unclear. Maybe Mozilla wasn't so much frenemy with Google and got removed from the browser market for it. I am also thinking about Linked Open Data. Decentralisation is quite weak, individually. I guess spreading all the Wikibases around to super-nodes is helpful unless it prevents the formation of a stronger lobby of philanthropists or competition to BigTech. Wikidata created some pressure on DBpedia as well (also opportunities), but we are fine since we can simply innovate. Others might not withstand. Microsoft seems to favor OpenStreetMaps so I am just asking to which degree Open Source and Open Data is being instrumentalised by BigTech.
Hence my question, whether it is compromise or be removed. (Note that states are also platforms, which measure value in GDP and make laws and roads and take VAT on transactions. Sometimes, they even don't remove opposition.)
Thank you for sharing your opinions, Sebastian.
Cheers, Thad https://www.linkedin.com/in/thadguidry/
On Fri, Sep 20, 2019 at 9:43 AM Sebastian Hellmann < hellmann@informatik.uni-leipzig.de> wrote:
Hi Thad, On 20.09.19 15:28, Thad Guidry wrote:
With my tech evangelist hat on...
Google's philanthropy is nearly boundless when it comes to the promotion of knowledge. Why? Because indeed it's in their best interest otherwise no one can prosper without knowledge. They aggregate knowledge for the benefit of mankind, and then make a profit through advertising ... all while making that knowledge extremely easy to be found for the world.
I am neither pro-Google or anti-Google per se. Maybe skeptical and interested in what is the truth behind the truth. Google is not synonym to philanthropy. Wikimedia is or at least I think they are doing many things right. Google is a platform, so primarily they "aggregate knowledge for their benefit" while creating enough incentives in form of accessibility for users to add the user's knowledge to theirs. It is not about what Google offers, but what it takes in return. 20% of employees time is also an investment in the skill of the employee, a Google asset called Human Capital and also leads to me and Denny from Google discussing whether https://en.wikipedia.org/wiki/Talk:Knowledge_Graph is content marketing or knowledge (@Denny: no offense, legit arguments, but no agenda to resolve the stalled discussion there). Except I don't have 20% time to straighten the view into what I believe would be neutral, so pushing it becomes a resource issue.
I found the other replies much more realistic and the perspective is yet unclear. Maybe Mozilla wasn't so much frenemy with Google and got removed from the browser market for it. I am also thinking about Linked Open Data. Decentralisation is quite weak, individually. I guess spreading all the Wikibases around to super-nodes is helpful unless it prevents the formation of a stronger lobby of philanthropists or competition to BigTech. Wikidata created some pressure on DBpedia as well (also opportunities), but we are fine since we can simply innovate. Others might not withstand. Microsoft seems to favor OpenStreetMaps so I am just asking to which degree Open Source and Open Data is being instrumentalised by BigTech.
Hence my question, whether it is compromise or be removed. (Note that states are also platforms, which measure value in GDP and make laws and roads and take VAT on transactions. Sometimes, they even don't remove opposition.)
-- All the best, Sebastian Hellmann
Director of Knowledge Integration and Linked Data Technologies (KILT) Competence Center at the Institute for Applied Informatics (InfAI) at Leipzig University Executive Director of the DBpedia Association Projects: http://dbpedia.org, http://nlp2rdf.org, http://linguistics.okfn.org, https://www.w3.org/community/ld4lt http://www.w3.org/community/ld4lt Homepage: http://aksw.org/SebastianHellmann Research Group: http://aksw.org
Sebastian,
"I don't want to facilitate conspiracy theories, but ..." "[I am] interested in what is the truth behind the truth"
I am sorry, I truly am, but this *is* the language I know from conspiracy theorists. And given that, I cannot imagine that there is anything I can say that could convince you otherwise. Therefore there is no real point for me in engaging with this conversation on these terms, I cannot see how it would turn constructive.
The answers to many of your questions are public and on the record. Others tried to point you to them (thanks), but you dismiss them as not fitting your narrative.
So here's a suggestion, which I think might be much more constructive and forward-looking:
I have been working on a comparison of DBpedia, Wikidata, and Freebase (and since you've read my thesis, you know that's a thing I know a bit about). Simple evaluation, coverage, correctness, nothing dramatically fancy. But I am torn about publishing it, because, d'oh, people may (with good reasons) dismiss it as being biased. And truth be told - the simple fact that I don't know DBpedia as well as I know Wikidata and Freebase might indeed have lead to errors, mistakes, and stuff I missed in the evaluation. But you know what would help?
You.
My suggestion is that I publish my current draft, and then you and me work together on it, publically, in the open, until we reach a state we both consider correct enough for publication.
What do you think?
Cheers, Denny
P.S.: I am travelling the next week, so I may ask for patience
On Fri, Sep 20, 2019 at 8:11 AM Thad Guidry thadguidry@gmail.com wrote:
Thank you for sharing your opinions, Sebastian.
Cheers, Thad https://www.linkedin.com/in/thadguidry/
On Fri, Sep 20, 2019 at 9:43 AM Sebastian Hellmann < hellmann@informatik.uni-leipzig.de> wrote:
Hi Thad, On 20.09.19 15:28, Thad Guidry wrote:
With my tech evangelist hat on...
Google's philanthropy is nearly boundless when it comes to the promotion of knowledge. Why? Because indeed it's in their best interest otherwise no one can prosper without knowledge. They aggregate knowledge for the benefit of mankind, and then make a profit through advertising ... all while making that knowledge extremely easy to be found for the world.
I am neither pro-Google or anti-Google per se. Maybe skeptical and interested in what is the truth behind the truth. Google is not synonym to philanthropy. Wikimedia is or at least I think they are doing many things right. Google is a platform, so primarily they "aggregate knowledge for their benefit" while creating enough incentives in form of accessibility for users to add the user's knowledge to theirs. It is not about what Google offers, but what it takes in return. 20% of employees time is also an investment in the skill of the employee, a Google asset called Human Capital and also leads to me and Denny from Google discussing whether https://en.wikipedia.org/wiki/Talk:Knowledge_Graph is content marketing or knowledge (@Denny: no offense, legit arguments, but no agenda to resolve the stalled discussion there). Except I don't have 20% time to straighten the view into what I believe would be neutral, so pushing it becomes a resource issue.
I found the other replies much more realistic and the perspective is yet unclear. Maybe Mozilla wasn't so much frenemy with Google and got removed from the browser market for it. I am also thinking about Linked Open Data. Decentralisation is quite weak, individually. I guess spreading all the Wikibases around to super-nodes is helpful unless it prevents the formation of a stronger lobby of philanthropists or competition to BigTech. Wikidata created some pressure on DBpedia as well (also opportunities), but we are fine since we can simply innovate. Others might not withstand. Microsoft seems to favor OpenStreetMaps so I am just asking to which degree Open Source and Open Data is being instrumentalised by BigTech.
Hence my question, whether it is compromise or be removed. (Note that states are also platforms, which measure value in GDP and make laws and roads and take VAT on transactions. Sometimes, they even don't remove opposition.)
-- All the best, Sebastian Hellmann
Director of Knowledge Integration and Linked Data Technologies (KILT) Competence Center at the Institute for Applied Informatics (InfAI) at Leipzig University Executive Director of the DBpedia Association Projects: http://dbpedia.org, http://nlp2rdf.org, http://linguistics.okfn.org, https://www.w3.org/community/ld4lt http://www.w3.org/community/ld4lt Homepage: http://aksw.org/SebastianHellmann Research Group: http://aksw.org
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Na, I am quite open, albeit impulsive. The information given was quite good and some of my concerns regarding the involvement of Google were also lifted or relativized. Mainly due to the fact that there seems to be a sense of awareness.
I am just studying economic principles, which are very powerful. I also have the feeling that free and open stuff just got a lot more commercial and I am still struggling with myself whether this is good or not. Also whether DBpedia should become frenemies with BigTech. Or funny things like many funding agencies try to push for national sustainability options, but most of the time, they suggest to use the GitHub Platform. Wikibase could be an option here.
I have to apologize for the Knowledge Graph Talk thing. I was a bit grumpy, because I thought I wasted a lot of time on the Talk page that could have been invested in making the article better (WP:BE_BOLD style), but now I think, it might have been my own mistake. So apologies for lashing out there.
(see comments below)
On 20.09.19 17:53, Denny Vrandečić wrote:
Sebastian,
"I don't want to facilitate conspiracy theories, but ..." "[I am] interested in what is the truth behind the truth"
I am sorry, I truly am, but this *is* the language I know from conspiracy theorists. And given that, I cannot imagine that there is anything I can say that could convince you otherwise. Therefore there is no real point for me in engaging with this conversation on these terms, I cannot see how it would turn constructive.
The answers to many of your questions are public and on the record. Others tried to point you to them (thanks), but you dismiss them as not fitting your narrative.
So here's a suggestion, which I think might be much more constructive and forward-looking:
I have been working on a comparison of DBpedia, Wikidata, and Freebase (and since you've read my thesis, you know that's a thing I know a bit about). Simple evaluation, coverage, correctness, nothing dramatically fancy. But I am torn about publishing it, because, d'oh, people may (with good reasons) dismiss it as being biased. And truth be told - the simple fact that I don't know DBpedia as well as I know Wikidata and Freebase might indeed have lead to errors, mistakes, and stuff I missed in the evaluation. But you know what would help?
You.
My suggestion is that I publish my current draft, and then you and me work together on it, publically, in the open, until we reach a state we both consider correct enough for publication.
What do you think?
Sure, we are doing statistics at the moment as well. It is a bit hard to define what DBpedia is nowadays as we are rebranding the remixed datasets, now that we can pick up links and other data from the Databus. It might not even be a real dataset anymore, but glue between datasets focusing on the speed of integration and ease of quality improvement. Also still working on the concrete Sync Targets for GlobalFactSync (https://meta.wikimedia.org/wiki/Grants:Project/DBpedia/GlobalFactSyncRE) as well.
One question I have is whether Wikidata is effective/efficient or where it is effective and where it could use improvement as a chance for collaboration.
So yes any time.
-- Sebastian
Cheers, Denny
P.S.: I am travelling the next week, so I may ask for patience
On Fri, Sep 20, 2019 at 8:11 AM Thad Guidry <thadguidry@gmail.com mailto:thadguidry@gmail.com> wrote:
Thank you for sharing your opinions, Sebastian. Cheers, Thad https://www.linkedin.com/in/thadguidry/ On Fri, Sep 20, 2019 at 9:43 AM Sebastian Hellmann <hellmann@informatik.uni-leipzig.de <mailto:hellmann@informatik.uni-leipzig.de>> wrote: Hi Thad, On 20.09.19 15:28, Thad Guidry wrote:
With my tech evangelist hat on... Google's philanthropy is nearly boundless when it comes to the promotion of knowledge. Why? Because indeed it's in their best interest otherwise no one can prosper without knowledge. They aggregate knowledge for the benefit of mankind, and then make a profit through advertising ... all while making that knowledge extremely easy to be found for the world.
I am neither pro-Google or anti-Google per se. Maybe skeptical and interested in what is the truth behind the truth. Google is not synonym to philanthropy. Wikimedia is or at least I think they are doing many things right. Google is a platform, so primarily they "aggregate knowledge for their benefit" while creating enough incentives in form of accessibility for users to add the user's knowledge to theirs. It is not about what Google offers, but what it takes in return. 20% of employees time is also an investment in the skill of the employee, a Google asset called Human Capital and also leads to me and Denny from Google discussing whether https://en.wikipedia.org/wiki/Talk:Knowledge_Graph is content marketing or knowledge (@Denny: no offense, legit arguments, but no agenda to resolve the stalled discussion there). Except I don't have 20% time to straighten the view into what I believe would be neutral, so pushing it becomes a resource issue. I found the other replies much more realistic and the perspective is yet unclear. Maybe Mozilla wasn't so much frenemy with Google and got removed from the browser market for it. I am also thinking about Linked Open Data. Decentralisation is quite weak, individually. I guess spreading all the Wikibases around to super-nodes is helpful unless it prevents the formation of a stronger lobby of philanthropists or competition to BigTech. Wikidata created some pressure on DBpedia as well (also opportunities), but we are fine since we can simply innovate. Others might not withstand. Microsoft seems to favor OpenStreetMaps so I am just asking to which degree Open Source and Open Data is being instrumentalised by BigTech. Hence my question, whether it is compromise or be removed. (Note that states are also platforms, which measure value in GDP and make laws and roads and take VAT on transactions. Sometimes, they even don't remove opposition.) -- All the best, Sebastian Hellmann Director of Knowledge Integration and Linked Data Technologies (KILT) Competence Center at the Institute for Applied Informatics (InfAI) at Leipzig University Executive Director of the DBpedia Association Projects: http://dbpedia.org, http://nlp2rdf.org, http://linguistics.okfn.org, https://www.w3.org/community/ld4lt <http://www.w3.org/community/ld4lt> Homepage: http://aksw.org/SebastianHellmann Research Group: http://aksw.org _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata
Yes, you're touching exactly on the problems I had during the evaluation - I couldn't even figure out what DBpedia is. Thanks, your help will be very much appreciated.
OK, I will send a link the week after the next, and then we can start working on it :) I am very much looking forward to it.
On Fri, Sep 20, 2019 at 10:11 AM Sebastian Hellmann < hellmann@informatik.uni-leipzig.de> wrote:
Na, I am quite open, albeit impulsive. The information given was quite good and some of my concerns regarding the involvement of Google were also lifted or relativized. Mainly due to the fact that there seems to be a sense of awareness.
I am just studying economic principles, which are very powerful. I also have the feeling that free and open stuff just got a lot more commercial and I am still struggling with myself whether this is good or not. Also whether DBpedia should become frenemies with BigTech. Or funny things like many funding agencies try to push for national sustainability options, but most of the time, they suggest to use the GitHub Platform. Wikibase could be an option here.
I have to apologize for the Knowledge Graph Talk thing. I was a bit grumpy, because I thought I wasted a lot of time on the Talk page that could have been invested in making the article better (WP:BE_BOLD style), but now I think, it might have been my own mistake. So apologies for lashing out there.
(see comments below) On 20.09.19 17:53, Denny Vrandečić wrote:
Sebastian,
"I don't want to facilitate conspiracy theories, but ..." "[I am] interested in what is the truth behind the truth"
I am sorry, I truly am, but this *is* the language I know from conspiracy theorists. And given that, I cannot imagine that there is anything I can say that could convince you otherwise. Therefore there is no real point for me in engaging with this conversation on these terms, I cannot see how it would turn constructive.
The answers to many of your questions are public and on the record. Others tried to point you to them (thanks), but you dismiss them as not fitting your narrative.
So here's a suggestion, which I think might be much more constructive and forward-looking:
I have been working on a comparison of DBpedia, Wikidata, and Freebase (and since you've read my thesis, you know that's a thing I know a bit about). Simple evaluation, coverage, correctness, nothing dramatically fancy. But I am torn about publishing it, because, d'oh, people may (with good reasons) dismiss it as being biased. And truth be told - the simple fact that I don't know DBpedia as well as I know Wikidata and Freebase might indeed have lead to errors, mistakes, and stuff I missed in the evaluation. But you know what would help?
You.
My suggestion is that I publish my current draft, and then you and me work together on it, publically, in the open, until we reach a state we both consider correct enough for publication.
What do you think?
Sure, we are doing statistics at the moment as well. It is a bit hard to define what DBpedia is nowadays as we are rebranding the remixed datasets, now that we can pick up links and other data from the Databus. It might not even be a real dataset anymore, but glue between datasets focusing on the speed of integration and ease of quality improvement. Also still working on the concrete Sync Targets for GlobalFactSync ( https://meta.wikimedia.org/wiki/Grants:Project/DBpedia/GlobalFactSyncRE) as well.
One question I have is whether Wikidata is effective/efficient or where it is effective and where it could use improvement as a chance for collaboration.
So yes any time.
-- Sebastian
Cheers, Denny
P.S.: I am travelling the next week, so I may ask for patience
On Fri, Sep 20, 2019 at 8:11 AM Thad Guidry thadguidry@gmail.com wrote:
Thank you for sharing your opinions, Sebastian.
Cheers, Thad https://www.linkedin.com/in/thadguidry/
On Fri, Sep 20, 2019 at 9:43 AM Sebastian Hellmann < hellmann@informatik.uni-leipzig.de> wrote:
Hi Thad, On 20.09.19 15:28, Thad Guidry wrote:
With my tech evangelist hat on...
Google's philanthropy is nearly boundless when it comes to the promotion of knowledge. Why? Because indeed it's in their best interest otherwise no one can prosper without knowledge. They aggregate knowledge for the benefit of mankind, and then make a profit through advertising ... all while making that knowledge extremely easy to be found for the world.
I am neither pro-Google or anti-Google per se. Maybe skeptical and interested in what is the truth behind the truth. Google is not synonym to philanthropy. Wikimedia is or at least I think they are doing many things right. Google is a platform, so primarily they "aggregate knowledge for their benefit" while creating enough incentives in form of accessibility for users to add the user's knowledge to theirs. It is not about what Google offers, but what it takes in return. 20% of employees time is also an investment in the skill of the employee, a Google asset called Human Capital and also leads to me and Denny from Google discussing whether https://en.wikipedia.org/wiki/Talk:Knowledge_Graph is content marketing or knowledge (@Denny: no offense, legit arguments, but no agenda to resolve the stalled discussion there). Except I don't have 20% time to straighten the view into what I believe would be neutral, so pushing it becomes a resource issue.
I found the other replies much more realistic and the perspective is yet unclear. Maybe Mozilla wasn't so much frenemy with Google and got removed from the browser market for it. I am also thinking about Linked Open Data. Decentralisation is quite weak, individually. I guess spreading all the Wikibases around to super-nodes is helpful unless it prevents the formation of a stronger lobby of philanthropists or competition to BigTech. Wikidata created some pressure on DBpedia as well (also opportunities), but we are fine since we can simply innovate. Others might not withstand. Microsoft seems to favor OpenStreetMaps so I am just asking to which degree Open Source and Open Data is being instrumentalised by BigTech.
Hence my question, whether it is compromise or be removed. (Note that states are also platforms, which measure value in GDP and make laws and roads and take VAT on transactions. Sometimes, they even don't remove opposition.)
-- All the best, Sebastian Hellmann
Director of Knowledge Integration and Linked Data Technologies (KILT) Competence Center at the Institute for Applied Informatics (InfAI) at Leipzig University Executive Director of the DBpedia Association Projects: http://dbpedia.org, http://nlp2rdf.org, http://linguistics.okfn.org, https://www.w3.org/community/ld4lt http://www.w3.org/community/ld4lt Homepage: http://aksw.org/SebastianHellmann Research Group: http://aksw.org
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
-- All the best, Sebastian Hellmann
Director of Knowledge Integration and Linked Data Technologies (KILT) Competence Center at the Institute for Applied Informatics (InfAI) at Leipzig University Executive Director of the DBpedia Association Projects: http://dbpedia.org, http://nlp2rdf.org, http://linguistics.okfn.org, https://www.w3.org/community/ld4lt http://www.w3.org/community/ld4lt Homepage: http://aksw.org/SebastianHellmann Research Group: http://aksw.org
Just an ominous note here. It has to do with th property of the semantic web of only having one schema and several id's for same things and then it is just a matter of how to partition it again and distribute it to where people need the information and establishing feedback in the opposite direction. Basically an implemented variation of what Kingsley has been saying for years.
Waiting for your message.
LG, Sebastian
On September 20, 2019 7:31:36 PM GMT+02:00, "Denny Vrandečić" vrandecic@gmail.com wrote:
Yes, you're touching exactly on the problems I had during the evaluation - I couldn't even figure out what DBpedia is. Thanks, your help will be very much appreciated.
OK, I will send a link the week after the next, and then we can start working on it :) I am very much looking forward to it.
On Fri, Sep 20, 2019 at 10:11 AM Sebastian Hellmann < hellmann@informatik.uni-leipzig.de> wrote:
Na, I am quite open, albeit impulsive. The information given was
quite
good and some of my concerns regarding the involvement of Google were
also
lifted or relativized. Mainly due to the fact that there seems to be
a
sense of awareness.
I am just studying economic principles, which are very powerful. I
also
have the feeling that free and open stuff just got a lot more
commercial
and I am still struggling with myself whether this is good or not.
Also
whether DBpedia should become frenemies with BigTech. Or funny things
like
many funding agencies try to push for national sustainability
options, but
most of the time, they suggest to use the GitHub Platform. Wikibase
could
be an option here.
I have to apologize for the Knowledge Graph Talk thing. I was a bit grumpy, because I thought I wasted a lot of time on the Talk page
that
could have been invested in making the article better (WP:BE_BOLD
style),
but now I think, it might have been my own mistake. So apologies for lashing out there.
(see comments below) On 20.09.19 17:53, Denny Vrandečić wrote:
Sebastian,
"I don't want to facilitate conspiracy theories, but ..." "[I am] interested in what is the truth behind the truth"
I am sorry, I truly am, but this *is* the language I know from
conspiracy
theorists. And given that, I cannot imagine that there is anything I
can
say that could convince you otherwise. Therefore there is no real
point for
me in engaging with this conversation on these terms, I cannot see
how it
would turn constructive.
The answers to many of your questions are public and on the record.
Others
tried to point you to them (thanks), but you dismiss them as not
fitting
your narrative.
So here's a suggestion, which I think might be much more constructive
and
forward-looking:
I have been working on a comparison of DBpedia, Wikidata, and
Freebase
(and since you've read my thesis, you know that's a thing I know a
bit
about). Simple evaluation, coverage, correctness, nothing
dramatically
fancy. But I am torn about publishing it, because, d'oh, people may
(with
good reasons) dismiss it as being biased. And truth be told - the
simple
fact that I don't know DBpedia as well as I know Wikidata and
Freebase
might indeed have lead to errors, mistakes, and stuff I missed in the evaluation. But you know what would help?
You.
My suggestion is that I publish my current draft, and then you and me
work
together on it, publically, in the open, until we reach a state we
both
consider correct enough for publication.
What do you think?
Sure, we are doing statistics at the moment as well. It is a bit hard
to
define what DBpedia is nowadays as we are rebranding the remixed
datasets,
now that we can pick up links and other data from the Databus. It
might not
even be a real dataset anymore, but glue between datasets focusing on
the
speed of integration and ease of quality improvement. Also still
working on
the concrete Sync Targets for GlobalFactSync (
https://meta.wikimedia.org/wiki/Grants:Project/DBpedia/GlobalFactSyncRE)
as well.
One question I have is whether Wikidata is effective/efficient or
where it
is effective and where it could use improvement as a chance for collaboration.
So yes any time.
-- Sebastian
Cheers, Denny
P.S.: I am travelling the next week, so I may ask for patience
On Fri, Sep 20, 2019 at 8:11 AM Thad Guidry thadguidry@gmail.com
wrote:
Thank you for sharing your opinions, Sebastian.
Cheers, Thad https://www.linkedin.com/in/thadguidry/
On Fri, Sep 20, 2019 at 9:43 AM Sebastian Hellmann < hellmann@informatik.uni-leipzig.de> wrote:
Hi Thad, On 20.09.19 15:28, Thad Guidry wrote:
With my tech evangelist hat on...
Google's philanthropy is nearly boundless when it comes to the
promotion
of knowledge. Why? Because indeed it's in their best interest
otherwise no
one can prosper without knowledge. They aggregate knowledge for
the
benefit of mankind, and then make a profit through advertising ...
all
while making that knowledge extremely easy to be found for the
world.
I am neither pro-Google or anti-Google per se. Maybe skeptical and interested in what is the truth behind the truth. Google is not
synonym to
philanthropy. Wikimedia is or at least I think they are doing many
things
right. Google is a platform, so primarily they "aggregate knowledge
for
their benefit" while creating enough incentives in form of
accessibility
for users to add the user's knowledge to theirs. It is not about
what
Google offers, but what it takes in return. 20% of employees time
is also
an investment in the skill of the employee, a Google asset called
Human
Capital and also leads to me and Denny from Google discussing
whether
https://en.wikipedia.org/wiki/Talk:Knowledge_Graph is content
marketing
or knowledge (@Denny: no offense, legit arguments, but no agenda to
resolve
the stalled discussion there). Except I don't have 20% time to
straighten
the view into what I believe would be neutral, so pushing it
becomes a
resource issue.
I found the other replies much more realistic and the perspective
is yet
unclear. Maybe Mozilla wasn't so much frenemy with Google and got
removed
from the browser market for it. I am also thinking about Linked
Open Data.
Decentralisation is quite weak, individually. I guess spreading all
the
Wikibases around to super-nodes is helpful unless it prevents the
formation
of a stronger lobby of philanthropists or competition to BigTech.
Wikidata
created some pressure on DBpedia as well (also opportunities), but
we are
fine since we can simply innovate. Others might not withstand.
Microsoft
seems to favor OpenStreetMaps so I am just asking to which degree
Open
Source and Open Data is being instrumentalised by BigTech.
Hence my question, whether it is compromise or be removed. (Note
that
states are also platforms, which measure value in GDP and make laws
and
roads and take VAT on transactions. Sometimes, they even don't
remove
opposition.)
-- All the best, Sebastian Hellmann
Director of Knowledge Integration and Linked Data Technologies
(KILT)
Competence Center at the Institute for Applied Informatics (InfAI) at Leipzig
University
Executive Director of the DBpedia Association Projects: http://dbpedia.org, http://nlp2rdf.org, http://linguistics.okfn.org, https://www.w3.org/community/ld4lt http://www.w3.org/community/ld4lt Homepage: http://aksw.org/SebastianHellmann Research Group: http://aksw.org
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
-- All the best, Sebastian Hellmann
Director of Knowledge Integration and Linked Data Technologies (KILT) Competence Center at the Institute for Applied Informatics (InfAI) at Leipzig
University
Executive Director of the DBpedia Association Projects: http://dbpedia.org, http://nlp2rdf.org, http://linguistics.okfn.org, https://www.w3.org/community/ld4lt http://www.w3.org/community/ld4lt Homepage: http://aksw.org/SebastianHellmann Research Group: http://aksw.org
I'm also interested in this comparison and intersection, and glad to share perspective + help. Warmly, SJ
On Fri, Sep 20, 2019 at 1:32 PM Denny Vrandečić vrandecic@gmail.com wrote:
Yes, you're touching exactly on the problems I had during the evaluation - I couldn't even figure out what DBpedia is. Thanks, your help will be very much appreciated.
OK, I will send a link the week after the next, and then we can start working on it :) I am very much looking forward to it.
On Fri, Sep 20, 2019 at 10:11 AM Sebastian Hellmann < hellmann@informatik.uni-leipzig.de> wrote:
Na, I am quite open, albeit impulsive. The information given was quite good and some of my concerns regarding the involvement of Google were also lifted or relativized. Mainly due to the fact that there seems to be a sense of awareness.
I am just studying economic principles, which are very powerful. I also have the feeling that free and open stuff just got a lot more commercial and I am still struggling with myself whether this is good or not. Also whether DBpedia should become frenemies with BigTech. Or funny things like many funding agencies try to push for national sustainability options, but most of the time, they suggest to use the GitHub Platform. Wikibase could be an option here.
I have to apologize for the Knowledge Graph Talk thing. I was a bit grumpy, because I thought I wasted a lot of time on the Talk page that could have been invested in making the article better (WP:BE_BOLD style), but now I think, it might have been my own mistake. So apologies for lashing out there.
(see comments below) On 20.09.19 17:53, Denny Vrandečić wrote:
Sebastian,
"I don't want to facilitate conspiracy theories, but ..." "[I am] interested in what is the truth behind the truth"
I am sorry, I truly am, but this *is* the language I know from conspiracy theorists. And given that, I cannot imagine that there is anything I can say that could convince you otherwise. Therefore there is no real point for me in engaging with this conversation on these terms, I cannot see how it would turn constructive.
The answers to many of your questions are public and on the record. Others tried to point you to them (thanks), but you dismiss them as not fitting your narrative.
So here's a suggestion, which I think might be much more constructive and forward-looking:
I have been working on a comparison of DBpedia, Wikidata, and Freebase (and since you've read my thesis, you know that's a thing I know a bit about). Simple evaluation, coverage, correctness, nothing dramatically fancy. But I am torn about publishing it, because, d'oh, people may (with good reasons) dismiss it as being biased. And truth be told - the simple fact that I don't know DBpedia as well as I know Wikidata and Freebase might indeed have lead to errors, mistakes, and stuff I missed in the evaluation. But you know what would help?
You.
My suggestion is that I publish my current draft, and then you and me work together on it, publically, in the open, until we reach a state we both consider correct enough for publication.
What do you think?
Sure, we are doing statistics at the moment as well. It is a bit hard to define what DBpedia is nowadays as we are rebranding the remixed datasets, now that we can pick up links and other data from the Databus. It might not even be a real dataset anymore, but glue between datasets focusing on the speed of integration and ease of quality improvement. Also still working on the concrete Sync Targets for GlobalFactSync ( https://meta.wikimedia.org/wiki/Grants:Project/DBpedia/GlobalFactSyncRE) as well.
One question I have is whether Wikidata is effective/efficient or where it is effective and where it could use improvement as a chance for collaboration.
So yes any time.
-- Sebastian
Cheers, Denny
P.S.: I am travelling the next week, so I may ask for patience
On Fri, Sep 20, 2019 at 8:11 AM Thad Guidry thadguidry@gmail.com wrote:
Thank you for sharing your opinions, Sebastian.
Cheers, Thad https://www.linkedin.com/in/thadguidry/
On Fri, Sep 20, 2019 at 9:43 AM Sebastian Hellmann < hellmann@informatik.uni-leipzig.de> wrote:
Hi Thad, On 20.09.19 15:28, Thad Guidry wrote:
With my tech evangelist hat on...
Google's philanthropy is nearly boundless when it comes to the promotion of knowledge. Why? Because indeed it's in their best interest otherwise no one can prosper without knowledge. They aggregate knowledge for the benefit of mankind, and then make a profit through advertising ... all while making that knowledge extremely easy to be found for the world.
I am neither pro-Google or anti-Google per se. Maybe skeptical and interested in what is the truth behind the truth. Google is not synonym to philanthropy. Wikimedia is or at least I think they are doing many things right. Google is a platform, so primarily they "aggregate knowledge for their benefit" while creating enough incentives in form of accessibility for users to add the user's knowledge to theirs. It is not about what Google offers, but what it takes in return. 20% of employees time is also an investment in the skill of the employee, a Google asset called Human Capital and also leads to me and Denny from Google discussing whether https://en.wikipedia.org/wiki/Talk:Knowledge_Graph is content marketing or knowledge (@Denny: no offense, legit arguments, but no agenda to resolve the stalled discussion there). Except I don't have 20% time to straighten the view into what I believe would be neutral, so pushing it becomes a resource issue.
I found the other replies much more realistic and the perspective is yet unclear. Maybe Mozilla wasn't so much frenemy with Google and got removed from the browser market for it. I am also thinking about Linked Open Data. Decentralisation is quite weak, individually. I guess spreading all the Wikibases around to super-nodes is helpful unless it prevents the formation of a stronger lobby of philanthropists or competition to BigTech. Wikidata created some pressure on DBpedia as well (also opportunities), but we are fine since we can simply innovate. Others might not withstand. Microsoft seems to favor OpenStreetMaps so I am just asking to which degree Open Source and Open Data is being instrumentalised by BigTech.
Hence my question, whether it is compromise or be removed. (Note that states are also platforms, which measure value in GDP and make laws and roads and take VAT on transactions. Sometimes, they even don't remove opposition.)
-- All the best, Sebastian Hellmann
Director of Knowledge Integration and Linked Data Technologies (KILT) Competence Center at the Institute for Applied Informatics (InfAI) at Leipzig University Executive Director of the DBpedia Association Projects: http://dbpedia.org, http://nlp2rdf.org, http://linguistics.okfn.org, https://www.w3.org/community/ld4lt http://www.w3.org/community/ld4lt Homepage: http://aksw.org/SebastianHellmann Research Group: http://aksw.org
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
-- All the best, Sebastian Hellmann
Director of Knowledge Integration and Linked Data Technologies (KILT) Competence Center at the Institute for Applied Informatics (InfAI) at Leipzig University Executive Director of the DBpedia Association Projects: http://dbpedia.org, http://nlp2rdf.org, http://linguistics.okfn.org, https://www.w3.org/community/ld4lt http://www.w3.org/community/ld4lt Homepage: http://aksw.org/SebastianHellmann Research Group: http://aksw.org
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
I would love your input! I will send the link here, and any contribution will be welcome :)
Thank you!
On Fri, Sep 20, 2019 at 11:05 AM Samuel Klein meta.sj@gmail.com wrote:
I'm also interested in this comparison and intersection, and glad to share perspective + help. Warmly, SJ
On Fri, Sep 20, 2019 at 1:32 PM Denny Vrandečić vrandecic@gmail.com wrote:
Yes, you're touching exactly on the problems I had during the evaluation
- I couldn't even figure out what DBpedia is. Thanks, your help will be
very much appreciated.
OK, I will send a link the week after the next, and then we can start working on it :) I am very much looking forward to it.
On Fri, Sep 20, 2019 at 10:11 AM Sebastian Hellmann < hellmann@informatik.uni-leipzig.de> wrote:
Na, I am quite open, albeit impulsive. The information given was quite good and some of my concerns regarding the involvement of Google were also lifted or relativized. Mainly due to the fact that there seems to be a sense of awareness.
I am just studying economic principles, which are very powerful. I also have the feeling that free and open stuff just got a lot more commercial and I am still struggling with myself whether this is good or not. Also whether DBpedia should become frenemies with BigTech. Or funny things like many funding agencies try to push for national sustainability options, but most of the time, they suggest to use the GitHub Platform. Wikibase could be an option here.
I have to apologize for the Knowledge Graph Talk thing. I was a bit grumpy, because I thought I wasted a lot of time on the Talk page that could have been invested in making the article better (WP:BE_BOLD style), but now I think, it might have been my own mistake. So apologies for lashing out there.
(see comments below) On 20.09.19 17:53, Denny Vrandečić wrote:
Sebastian,
"I don't want to facilitate conspiracy theories, but ..." "[I am] interested in what is the truth behind the truth"
I am sorry, I truly am, but this *is* the language I know from conspiracy theorists. And given that, I cannot imagine that there is anything I can say that could convince you otherwise. Therefore there is no real point for me in engaging with this conversation on these terms, I cannot see how it would turn constructive.
The answers to many of your questions are public and on the record. Others tried to point you to them (thanks), but you dismiss them as not fitting your narrative.
So here's a suggestion, which I think might be much more constructive and forward-looking:
I have been working on a comparison of DBpedia, Wikidata, and Freebase (and since you've read my thesis, you know that's a thing I know a bit about). Simple evaluation, coverage, correctness, nothing dramatically fancy. But I am torn about publishing it, because, d'oh, people may (with good reasons) dismiss it as being biased. And truth be told - the simple fact that I don't know DBpedia as well as I know Wikidata and Freebase might indeed have lead to errors, mistakes, and stuff I missed in the evaluation. But you know what would help?
You.
My suggestion is that I publish my current draft, and then you and me work together on it, publically, in the open, until we reach a state we both consider correct enough for publication.
What do you think?
Sure, we are doing statistics at the moment as well. It is a bit hard to define what DBpedia is nowadays as we are rebranding the remixed datasets, now that we can pick up links and other data from the Databus. It might not even be a real dataset anymore, but glue between datasets focusing on the speed of integration and ease of quality improvement. Also still working on the concrete Sync Targets for GlobalFactSync ( https://meta.wikimedia.org/wiki/Grants:Project/DBpedia/GlobalFactSyncRE) as well.
One question I have is whether Wikidata is effective/efficient or where it is effective and where it could use improvement as a chance for collaboration.
So yes any time.
-- Sebastian
Cheers, Denny
P.S.: I am travelling the next week, so I may ask for patience
On Fri, Sep 20, 2019 at 8:11 AM Thad Guidry thadguidry@gmail.com wrote:
Thank you for sharing your opinions, Sebastian.
Cheers, Thad https://www.linkedin.com/in/thadguidry/
On Fri, Sep 20, 2019 at 9:43 AM Sebastian Hellmann < hellmann@informatik.uni-leipzig.de> wrote:
Hi Thad, On 20.09.19 15:28, Thad Guidry wrote:
With my tech evangelist hat on...
Google's philanthropy is nearly boundless when it comes to the promotion of knowledge. Why? Because indeed it's in their best interest otherwise no one can prosper without knowledge. They aggregate knowledge for the benefit of mankind, and then make a profit through advertising ... all while making that knowledge extremely easy to be found for the world.
I am neither pro-Google or anti-Google per se. Maybe skeptical and interested in what is the truth behind the truth. Google is not synonym to philanthropy. Wikimedia is or at least I think they are doing many things right. Google is a platform, so primarily they "aggregate knowledge for their benefit" while creating enough incentives in form of accessibility for users to add the user's knowledge to theirs. It is not about what Google offers, but what it takes in return. 20% of employees time is also an investment in the skill of the employee, a Google asset called Human Capital and also leads to me and Denny from Google discussing whether https://en.wikipedia.org/wiki/Talk:Knowledge_Graph is content marketing or knowledge (@Denny: no offense, legit arguments, but no agenda to resolve the stalled discussion there). Except I don't have 20% time to straighten the view into what I believe would be neutral, so pushing it becomes a resource issue.
I found the other replies much more realistic and the perspective is yet unclear. Maybe Mozilla wasn't so much frenemy with Google and got removed from the browser market for it. I am also thinking about Linked Open Data. Decentralisation is quite weak, individually. I guess spreading all the Wikibases around to super-nodes is helpful unless it prevents the formation of a stronger lobby of philanthropists or competition to BigTech. Wikidata created some pressure on DBpedia as well (also opportunities), but we are fine since we can simply innovate. Others might not withstand. Microsoft seems to favor OpenStreetMaps so I am just asking to which degree Open Source and Open Data is being instrumentalised by BigTech.
Hence my question, whether it is compromise or be removed. (Note that states are also platforms, which measure value in GDP and make laws and roads and take VAT on transactions. Sometimes, they even don't remove opposition.)
-- All the best, Sebastian Hellmann
Director of Knowledge Integration and Linked Data Technologies (KILT) Competence Center at the Institute for Applied Informatics (InfAI) at Leipzig University Executive Director of the DBpedia Association Projects: http://dbpedia.org, http://nlp2rdf.org, http://linguistics.okfn.org, https://www.w3.org/community/ld4lt http://www.w3.org/community/ld4lt Homepage: http://aksw.org/SebastianHellmann Research Group: http://aksw.org
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
-- All the best, Sebastian Hellmann
Director of Knowledge Integration and Linked Data Technologies (KILT) Competence Center at the Institute for Applied Informatics (InfAI) at Leipzig University Executive Director of the DBpedia Association Projects: http://dbpedia.org, http://nlp2rdf.org, http://linguistics.okfn.org, https://www.w3.org/community/ld4lt http://www.w3.org/community/ld4lt Homepage: http://aksw.org/SebastianHellmann Research Group: http://aksw.org
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
-- Samuel Klein @metasj w:user:sj +1 617 529 4266 <(617)%20529-4266> _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
One more thing, I would be interested in. I don't think comparing wikidata and freebase to DBpedia will make sense as these are sources for us. However we could compare DBpedia including the Wikidata and Freebase part to the Google Knowledge Graph and repeat this every three months to guide our community in integrating more sources. Can we do that?
-- Sebastian
On September 20, 2019 8:07:28 PM GMT+02:00, "Denny Vrandečić" vrandecic@google.com wrote:
I would love your input! I will send the link here, and any contribution will be welcome :)
Thank you!
On Fri, Sep 20, 2019 at 11:05 AM Samuel Klein meta.sj@gmail.com wrote:
I'm also interested in this comparison and intersection, and glad to
share
perspective + help. Warmly, SJ
On Fri, Sep 20, 2019 at 1:32 PM Denny Vrandečić vrandecic@gmail.com wrote:
Yes, you're touching exactly on the problems I had during the
evaluation
- I couldn't even figure out what DBpedia is. Thanks, your help will
be
very much appreciated.
OK, I will send a link the week after the next, and then we can
start
working on it :) I am very much looking forward to it.
On Fri, Sep 20, 2019 at 10:11 AM Sebastian Hellmann < hellmann@informatik.uni-leipzig.de> wrote:
Na, I am quite open, albeit impulsive. The information given was
quite
good and some of my concerns regarding the involvement of Google
were also
lifted or relativized. Mainly due to the fact that there seems to
be a
sense of awareness.
I am just studying economic principles, which are very powerful. I
also
have the feeling that free and open stuff just got a lot more
commercial
and I am still struggling with myself whether this is good or not.
Also
whether DBpedia should become frenemies with BigTech. Or funny
things like
many funding agencies try to push for national sustainability
options, but
most of the time, they suggest to use the GitHub Platform. Wikibase
could
be an option here.
I have to apologize for the Knowledge Graph Talk thing. I was a bit grumpy, because I thought I wasted a lot of time on the Talk page
that
could have been invested in making the article better (WP:BE_BOLD
style),
but now I think, it might have been my own mistake. So apologies
for
lashing out there.
(see comments below) On 20.09.19 17:53, Denny Vrandečić wrote:
Sebastian,
"I don't want to facilitate conspiracy theories, but ..." "[I am] interested in what is the truth behind the truth"
I am sorry, I truly am, but this *is* the language I know from conspiracy theorists. And given that, I cannot imagine that there
is
anything I can say that could convince you otherwise. Therefore
there is no
real point for me in engaging with this conversation on these
terms, I
cannot see how it would turn constructive.
The answers to many of your questions are public and on the record. Others tried to point you to them (thanks), but you dismiss them as
not
fitting your narrative.
So here's a suggestion, which I think might be much more
constructive
and forward-looking:
I have been working on a comparison of DBpedia, Wikidata, and
Freebase
(and since you've read my thesis, you know that's a thing I know a
bit
about). Simple evaluation, coverage, correctness, nothing
dramatically
fancy. But I am torn about publishing it, because, d'oh, people may
(with
good reasons) dismiss it as being biased. And truth be told - the
simple
fact that I don't know DBpedia as well as I know Wikidata and
Freebase
might indeed have lead to errors, mistakes, and stuff I missed in
the
evaluation. But you know what would help?
You.
My suggestion is that I publish my current draft, and then you and
me
work together on it, publically, in the open, until we reach a
state we
both consider correct enough for publication.
What do you think?
Sure, we are doing statistics at the moment as well. It is a bit
hard to
define what DBpedia is nowadays as we are rebranding the remixed
datasets,
now that we can pick up links and other data from the Databus. It
might not
even be a real dataset anymore, but glue between datasets focusing
on the
speed of integration and ease of quality improvement. Also still
working on
the concrete Sync Targets for GlobalFactSync (
https://meta.wikimedia.org/wiki/Grants:Project/DBpedia/GlobalFactSyncRE)
as well.
One question I have is whether Wikidata is effective/efficient or
where
it is effective and where it could use improvement as a chance for collaboration.
So yes any time.
-- Sebastian
Cheers, Denny
P.S.: I am travelling the next week, so I may ask for patience
On Fri, Sep 20, 2019 at 8:11 AM Thad Guidry thadguidry@gmail.com wrote:
Thank you for sharing your opinions, Sebastian.
Cheers, Thad https://www.linkedin.com/in/thadguidry/
On Fri, Sep 20, 2019 at 9:43 AM Sebastian Hellmann < hellmann@informatik.uni-leipzig.de> wrote:
Hi Thad, On 20.09.19 15:28, Thad Guidry wrote:
With my tech evangelist hat on...
Google's philanthropy is nearly boundless when it comes to the promotion of knowledge. Why? Because indeed it's in their best
interest
otherwise no one can prosper without knowledge. They aggregate
knowledge
for the benefit of mankind, and then make a profit through
advertising ...
all while making that knowledge extremely easy to be found for
the world.
I am neither pro-Google or anti-Google per se. Maybe skeptical
and
interested in what is the truth behind the truth. Google is not
synonym to
philanthropy. Wikimedia is or at least I think they are doing
many things
right. Google is a platform, so primarily they "aggregate
knowledge for
their benefit" while creating enough incentives in form of
accessibility
for users to add the user's knowledge to theirs. It is not about
what
Google offers, but what it takes in return. 20% of employees time
is also
an investment in the skill of the employee, a Google asset called
Human
Capital and also leads to me and Denny from Google discussing
whether
https://en.wikipedia.org/wiki/Talk:Knowledge_Graph is content marketing or knowledge (@Denny: no offense, legit arguments, but
no agenda
to resolve the stalled discussion there). Except I don't have 20%
time to
straighten the view into what I believe would be neutral, so
pushing it
becomes a resource issue.
I found the other replies much more realistic and the perspective
is
yet unclear. Maybe Mozilla wasn't so much frenemy with Google and
got
removed from the browser market for it. I am also thinking about
Linked
Open Data. Decentralisation is quite weak, individually. I guess
spreading
all the Wikibases around to super-nodes is helpful unless it
prevents the
formation of a stronger lobby of philanthropists or competition
to BigTech.
Wikidata created some pressure on DBpedia as well (also
opportunities), but
we are fine since we can simply innovate. Others might not
withstand.
Microsoft seems to favor OpenStreetMaps so I am just asking to
which degree
Open Source and Open Data is being instrumentalised by BigTech.
Hence my question, whether it is compromise or be removed. (Note
that
states are also platforms, which measure value in GDP and make
laws and
roads and take VAT on transactions. Sometimes, they even don't
remove
opposition.)
-- All the best, Sebastian Hellmann
Director of Knowledge Integration and Linked Data Technologies
(KILT)
Competence Center at the Institute for Applied Informatics (InfAI) at Leipzig
University
Executive Director of the DBpedia Association Projects: http://dbpedia.org, http://nlp2rdf.org, http://linguistics.okfn.org, https://www.w3.org/community/ld4lt http://www.w3.org/community/ld4lt Homepage: http://aksw.org/SebastianHellmann Research Group: http://aksw.org
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
-- All the best, Sebastian Hellmann
Director of Knowledge Integration and Linked Data Technologies
(KILT)
Competence Center at the Institute for Applied Informatics (InfAI) at Leipzig
University
Executive Director of the DBpedia Association Projects: http://dbpedia.org, http://nlp2rdf.org, http://linguistics.okfn.org, https://www.w3.org/community/ld4lt http://www.w3.org/community/ld4lt Homepage: http://aksw.org/SebastianHellmann Research Group: http://aksw.org
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
-- Samuel Klein @metasj w:user:sj +1 617 529
4266
<(617)%20529-4266> _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On 9/20/19 1:31 PM, Denny Vrandečić wrote:
Yes, you're touching exactly on the problems I had during the evaluation - I couldn't even figure out what DBpedia is.
Hi Denny and Sebastian,
To reiterate and/or clarify.
DBpedia is a community project comprising RDF datasets constructed from Wikipedia content that's deployed using Linked Data principles.
The description above implies the following re focus breakdown:
[1] Dataset creation -- this cannot be created in line with Linked Data principles without the items that follow
[2] Linked Data Deployment -- without this there is nothing to look-up re follow-your-nose exploration
[3] SPARQL Query Services -- without this there is nothing to query
Over the years I've written a number of posts addressing the key question "what is DBpedia?"
[1] https://medium.com/openlink-software-blog/what-is-dbpedia-and-why-is-it-impo... -- What is DBpedia, and why is it important?
[2] https://medium.com/virtuoso-blog/on-the-mutually-beneficial-nature-of-dbpedi... -- Mutually beneficial nature of Wikidata and DBpedia
Hi Kingsley,
that describes the core of the glue that DBpedia is. The definition leads to people downloading the EN DBpedia dataset and running statistics that will only discover what data is wrong or missing in the smallest parts of DBpedia.
What happened to "LOD is the largest knowledge graph on earth" ? Querying more Freebase data from DBpedia via Linked Data is a use case since over 10 years now using ontologies as a GPS.
Also the definition you give limits the community to people who have edited 10 Scala Classes in the extraction framework, which is probably 10 people altogether.
So this is the most exclusionist view I can think of.
What you wrote here is adequate: https://medium.com/openlink-software-blog/what-is-dbpedia-and-why-is-it-impo...
What you wrote in your email as a summary is very narrow and misleading, see Markus Kroetzsch's email. People will continue to measure DBpedia by exactly the part of the data that is loaded in the Virtuoso SPARQL endpoint unless we make the derivatives downloadable outside of HTTP LD requests.
-- Sebastian
On September 22, 2019 12:30:24 AM GMT+02:00, Kingsley Idehen kidehen@openlinksw.com wrote:
On 9/20/19 1:31 PM, Denny Vrandečić wrote:
Yes, you're touching exactly on the problems I had during the evaluation - I couldn't even figure out what DBpedia is.
Hi Denny and Sebastian,
To reiterate and/or clarify.
DBpedia is a community project comprising RDF datasets constructed from Wikipedia content that's deployed using Linked Data principles.
The description above implies the following re focus breakdown:
[1] Dataset creation -- this cannot be created in line with Linked Data principles without the items that follow
[2] Linked Data Deployment -- without this there is nothing to look-up re follow-your-nose exploration
[3] SPARQL Query Services -- without this there is nothing to query
Over the years I've written a number of posts addressing the key question "what is DBpedia?"
[1] https://medium.com/openlink-software-blog/what-is-dbpedia-and-why-is-it-impo... -- What is DBpedia, and why is it important?
[2] https://medium.com/virtuoso-blog/on-the-mutually-beneficial-nature-of-dbpedi... -- Mutually beneficial nature of Wikidata and DBpedia
-- Regards,
Kingsley Idehen Founder & CEO OpenLink Software Home Page: http://www.openlinksw.com Community Support: https://community.openlinksw.com Weblogs (Blogs): Company Blog: https://medium.com/openlink-software-blog Virtuoso Blog: https://medium.com/virtuoso-blog Data Access Drivers Blog: https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers
Personal Weblogs (Blogs): Medium Blog: https://medium.com/@kidehen Legacy Blogs: http://www.openlinksw.com/blog/~kidehen/ http://kidehen.blogspot.com
Profile Pages: Pinterest: https://www.pinterest.com/kidehen/ Quora: https://www.quora.com/profile/Kingsley-Uyi-Idehen Twitter: https://twitter.com/kidehen Google+: https://plus.google.com/+KingsleyIdehen/about LinkedIn: http://www.linkedin.com/in/kidehen
Web Identities (WebID): Personal: http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i : http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this
On 9/22/19 1:34 AM, hellmann@informatik.uni-leipzig.de wrote:
Hi Kingsley,
that describes the core of the glue that DBpedia is. The definition leads to people downloading the EN DBpedia dataset and running statistics that will only discover what data is wrong or missing in the smallest parts of DBpedia.
The question was "What is DBpedia?" . What is misleading about it being about Wikipedia content transformed into RDF and deployed using Linked Data principles?
What happened to "LOD is the largest knowledge graph on earth" ?
The question wasn't "What is the LOD Cloud?" or am I missing something here.
Querying more Freebase data from DBpedia via Linked Data is a use case since over 10 years now using ontologies as a GPS.
Freebase is yet another derivative of Wikipedia content, isn't it?
Also the definition you give limits the community to people who have edited 10 Scala Classes in the extraction framework, which is probably 10 people altogether.
Look, can't you simply make a clear statement of what is missing from my definition of DBpedia? I sense you are talking about all the other utilities that have been developed by the project beyond dataset production e.g., services like DBpedia Spotlight etc?
So this is the most exclusionist view I can think of.
What you wrote here is adequate: https://medium.com/openlink-software-blog/what-is-dbpedia-and-why-is-it-impo...
What you wrote in your email as a summary is very narrow and misleading, see Markus Kroetzsch's email. People will continue to measure DBpedia by exactly the part of the data that is loaded in the Virtuoso SPARQL endpoint unless we make the derivatives downloadable outside of HTTP LD requests.
You really have to try using a slightly better tone when communicating.
You could simply say:
Kingsley, here are some thing that could be overlooked based on the description your presented:
Item 1..N.
I'll just fix it, or worst case agree to disagree.
Kingsley
-- Sebastian
On September 22, 2019 12:30:24 AM GMT+02:00, Kingsley Idehen kidehen@openlinksw.com wrote:
On 9/20/19 1:31 PM, Denny Vrandečić wrote: Yes, you're touching exactly on the problems I had during the evaluation - I couldn't even figure out what DBpedia is. Hi Denny and Sebastian, To reiterate and/or clarify. DBpedia is a community project comprising RDF datasets constructed from Wikipedia content that's deployed using Linked Data principles. The description above implies the following re focus breakdown: [1] Dataset creation -- this cannot be created in line with Linked Data principles without the items that follow [2] Linked Data Deployment -- without this there is nothing to look-up re follow-your-nose exploration [3] SPARQL Query Services -- without this there is nothing to query Over the years I've written a number of posts addressing the key question "what is DBpedia?" [1] https://medium.com/openlink-software-blog/what-is-dbpedia-and-why-is-it-important-d306b5324f90 -- What is DBpedia, and why is it important? [2] https://medium.com/virtuoso-blog/on-the-mutually-beneficial-nature-of-dbpedia-and-wikidata-5fb2b9f22ada -- Mutually beneficial nature of Wikidata and DBpedia
-- Sent from my Android device with K-9 Mail. Please excuse my brevity.
On 9/21/19 6:30 PM, Kingsley Idehen wrote:
On 9/20/19 1:31 PM, Denny Vrandečić wrote:
Yes, you're touching exactly on the problems I had during the evaluation - I couldn't even figure out what DBpedia is.
Hi Denny and Sebastian,
To reiterate and/or clarify.
DBpedia is a community project comprising RDF datasets constructed from Wikipedia content that's deployed using Linked Data principles.
A little clearer, as the definition above was a little too concise:
DBpedia is a community project comprising a variety of data curation tools, services (Linked Data lookup and SPARQL), and RDF datasets constructed from Wikipedia that's deployed using Linked Data principles and cross-referenced with other data sources as illustrated in the Linked Open Data Cloud (the world's largest Knowledge Graph)[1][2].
This project has recently spawned a Databus effort which addresses historic challenges associated with dataset curation, publication, discovery, and monetization [3].
[2] https://medium.com/virtuoso-blog/what-is-the-linked-open-data-cloud-and-why-... -- what is the LOD Cloud and why is it important?
[3] https://databus.dbpedia.org/ -- Databus
On 9/22/19 11:55 AM, Kingsley Idehen wrote:
On 9/21/19 6:30 PM, Kingsley Idehen wrote:
On 9/20/19 1:31 PM, Denny Vrandečić wrote:
Yes, you're touching exactly on the problems I had during the evaluation - I couldn't even figure out what DBpedia is.
Hi Denny and Sebastian,
To reiterate and/or clarify.
DBpedia is a community project comprising RDF datasets constructed from Wikipedia content that's deployed using Linked Data principles.
A little clearer, as the definition above was a little too concise:
DBpedia is a community project comprising a variety of data curation tools, services (Linked Data lookup and SPARQL), and RDF datasets constructed from Wikipedia that's deployed using Linked Data principles and cross-referenced with other data sources as illustrated in the Linked Open Data Cloud (the world's largest Knowledge Graph)[1][2].
This project has recently spawned a Databus effort which addresses historic challenges associated with dataset curation, publication, discovery, and monetization [3].
[2] https://medium.com/virtuoso-blog/what-is-the-linked-open-data-cloud-and-why-... -- what is the LOD Cloud and why is it important?
[3] https://databus.dbpedia.org/ -- Databus
TypoFix:
On 20/09/2019 17:53, Denny Vrandečić wrote: ...
I have been working on a comparison of DBpedia, Wikidata, and Freebase (and since you've read my thesis, you know that's a thing I know a bit about). Simple evaluation, coverage, correctness, nothing dramatically fancy. But I am torn about publishing it, because, d'oh, people may (with good reasons) dismiss it as being biased. And truth be told - the simple fact that I don't know DBpedia as well as I know Wikidata and Freebase might indeed have lead to errors, mistakes, and stuff I missed in the evaluation. But you know what would help?
I would also be very interested in seeing this. I had a closer look at DBpedia recently for a tutorial and was surprised by how different the data is in comparison to Wikidata. A methodological comparison would surely be helpful.
Of course, it has to be fair, taking into account that DBpedia editions are based on a Wikipedia in one language (hence is always missing entities that Wikidata has). For example, I recently computed the difference between the following two:
(1) The set of all pairs of ancestors that one can find by following (paths of) parent relations on EN DBPedia. (2) The set of all pairs of ancestors that one can find by following (paths of) mother/father relations on Wikidata, but visiting only items that are present in English Wikipedia.
I am not sure if this is fair or not, but I found it an interesting setup (non-local effects of incompleteness) -- and (2) is a nice illustration of something you cannot achieve in SPARQL on principled grounds ;-).
Cheers,
Markus
Agree, I am also interested in seeing this. I recently did a small comparison on science awards on coverage of laureates in both DBpedia and wikidata and came to the same conclusion. The difference sometimes was quite substantial in favour of Wikidata.
[image: image.png]
I would also be very interested in seeing this. I had a closer look at DBpedia recently for a tutorial and was surprised by how different the data is in comparison to Wikidata. A methodological comparison would surely be helpful.
Of course, it has to be fair, taking into account that DBpedia editions are based on a Wikipedia in one language (hence is always missing entities that Wikidata has). For example, I recently computed the difference between the following two:
(1) The set of all pairs of ancestors that one can find by following (paths of) parent relations on EN DBPedia. (2) The set of all pairs of ancestors that one can find by following (paths of) mother/father relations on Wikidata, but visiting only items that are present in English Wikipedia.
I am not sure if this is fair or not, but I found it an interesting setup (non-local effects of incompleteness) -- and (2) is a nice illustration of something you cannot achieve in SPARQL on principled grounds ;-).
Cheers,
Markus
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Still comparing a dataset (Wikidata) to an integration hub (DBpedia).
I would assume that popularity of content (e.g. Wikipedia page hits) directly relates to availability of data in Wikidata.
We have long fused all of this in a "best of" called FlexiFusion: https://svn.aksw.org/papers/2019/ISWC_FlexiFusion/public.pdf
Future agenda is to: - stabilize this release variant of DBepdia (fused and enriched) - mix in external (authoritative) datasets based on the references in WP and WD to create ultimate lists (total global coverage and correctness) - export enriched versions either using Wikidata's P's or WP's infoboxes, so it can be integrated back into Wikimedia (with references) and also sync it to whoever needs the data.
This is part of GlobalFactSyncRE: https://meta.wikimedia.org/wiki/Grants:Project/DBpedia/GlobalFactSyncRE
The formula here is quite easy: If you look at DBpedia's data in detail or a part of it, it will not shine so much since it is extracted, if you look at the flexibility and scalability of integration it will win. We are strengthening the tooling for the second part.
-- Sebastian
On 22.09.19 01:35, Andra Waagmeester wrote:
Agree, I am also interested in seeing this. I recently did a small comparison on science awards on coverage of laureates in both DBpedia and wikidata and came to the same conclusion. The difference sometimes was quite substantial in favour of Wikidata.
image.png
I would also be very interested in seeing this. I had a closer look at DBpedia recently for a tutorial and was surprised by how different the data is in comparison to Wikidata. A methodological comparison would surely be helpful. Of course, it has to be fair, taking into account that DBpedia editions are based on a Wikipedia in one language (hence is always missing entities that Wikidata has). For example, I recently computed the difference between the following two: (1) The set of all pairs of ancestors that one can find by following (paths of) parent relations on EN DBPedia. (2) The set of all pairs of ancestors that one can find by following (paths of) mother/father relations on Wikidata, but visiting only items that are present in English Wikipedia. I am not sure if this is fair or not, but I found it an interesting setup (non-local effects of incompleteness) -- and (2) is a nice illustration of something you cannot achieve in SPARQL on principled grounds ;-). Cheers, Markus _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On 22/09/2019 08:48, Sebastian Hellmann wrote: ...
The formula here is quite easy: If you look at DBpedia's data in detail or a part of it, it will not shine so much since it is extracted,
Sure, but I think that this is not clear to many people who are currently using DBpedia as a dataset (even if only for testing/research purposes). Also, there would surely be value in analysing the differences more closely. I agree with you that quantitatively, Wikidata might be orders of magnitudes ahead. Yet, there can still be individual bits of information that are in DBpedia but missing from Wikidata so far.
For example, DBpedia EN has 32 people educated at the University of Leipzig, whereas Wikidata has 1217. Nevertheless, there is, for example, John Henry Wright (Q6238997), who is known to DBpedia but not to Wikidata (yet). Such cases might be worth systematic weeding out so that we can really come to the point where Wikidata is a strict superset of all (correct) data in DBpedia.
Cheers,
Markus
DBpedia actually has no data, we provide tools to more effectively use OTHER PEOPLE'S DATA, e.g. Wikipedia.
Here is an image of the maximum size of the new scalable and actually bulk downloadable DBpedia via Databus in let's say one or two years:
With Download As Wikidata Q's and P's Option.
It's there, just hard to download in bulk.
LG, Sebastian
On September 22, 2019 10:41:10 AM GMT+02:00, Markus Kroetzsch markus.kroetzsch@tu-dresden.de wrote:
On 22/09/2019 08:48, Sebastian Hellmann wrote: ...
The formula here is quite easy: If you look at DBpedia's data in
detail
or a part of it, it will not shine so much since it is extracted,
Sure, but I think that this is not clear to many people who are currently using DBpedia as a dataset (even if only for testing/research
purposes). Also, there would surely be value in analysing the differences more closely. I agree with you that quantitatively, Wikidata might be orders of magnitudes ahead. Yet, there can still be individual
bits of information that are in DBpedia but missing from Wikidata so far.
For example, DBpedia EN has 32 people educated at the University of Leipzig, whereas Wikidata has 1217. Nevertheless, there is, for example, John Henry Wright (Q6238997), who is known to DBpedia but not to Wikidata (yet). Such cases might be worth systematic weeding out so that we can really come to the point where Wikidata is a strict superset of all (correct) data in DBpedia.
Cheers,
Markus
Hoi,
From my perspective the point of a data set is for it to be used. The
extend in which it is used defines how useful an individual data set is. I even blogged about it .. [1] Thanks, GerardM
[1] https://ultimategerardm.blogspot.com/2019/09/comparing-datasets-bigger-or-be...
On Sun, 22 Sep 2019 at 11:29, hellmann@informatik.uni-leipzig.de wrote:
DBpedia actually has no data, we provide tools to more effectively use OTHER PEOPLE'S DATA, e.g. Wikipedia.
Here is an image of the maximum size of the new scalable and actually bulk downloadable DBpedia via Databus in let's say one or two years:
With Download As Wikidata Q's and P's Option.
It's there, just hard to download in bulk.
LG, Sebastian
On September 22, 2019 10:41:10 AM GMT+02:00, Markus Kroetzsch < markus.kroetzsch@tu-dresden.de> wrote:
On 22/09/2019 08:48, Sebastian Hellmann wrote: ...
The formula here is quite easy: If you look at DBpedia's data in detail or a part of it, it will not shine so much since it is extracted,
Sure, but I think that this is not clear to many people who are currently using DBpedia as a dataset (even if only for testing/research purposes). Also, there would surely be value in analysing the differences more closely. I agree with you that quantitatively, Wikidata might be orders of magnitudes ahead. Yet, there can still be individual bits of information that are in DBpedia but missing from Wikidata so far.
For example, DBpedia EN has 32 people educated at the University of Leipzig, whereas Wikidata has 1217. Nevertheless, there is, for example, John Henry Wright (Q6238997), who is known to DBpedia but not to Wikidata (yet). Such cases might be worth systematic weeding out so that we can really come to the point where Wikidata is a strict superset of all (correct) data in DBpedia.
Cheers,
Markus
-- Sent from my Android device with K-9 Mail. Please excuse my brevity. _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On 9/21/19 7:35 PM, Andra Waagmeester wrote:
Agree, I am also interested in seeing this. I recently did a small comparison on science awards on coverage of laureates in both DBpedia and wikidata and came to the same conclusion. The difference sometimes was quite substantial in favour of Wikidata.
Are you not able to share SPARQL Query Results page links for this?
Hey Sebastian,
On 9/20/19 10:22 AM, Sebastian Hellmann wrote:
Not much of Freebase did end up in Wikidata.
Dropping here some pointers to shed light on the migration of Freebase to Wikidata, since I was partially involved in the process: 1. WikiProject [1]; 2. the paper behind [2]; 3. datasets to be migrated [3].
I can confirm that the migration has stalled: as of today, *528 thousands* Freebase statements were curated by the community, out of *10 million* ones. By 'curated', I mean approved or rejected. These numbers come from two queries against the primary sources tool database.
The stall is due to several causes: in my opinion, the most important one was the bad quality of sources [4,5] coming from the Knowledge Vault project [6].
Cheers,
Marco
[1] https://www.wikidata.org/wiki/Wikidata:WikiProject_Freebase [2] http://static.googleusercontent.com/media/research.google.com/en//pubs/archi... [3] https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool/Version_1#Data [4] https://www.wikidata.org/wiki/Wikidata_talk:Primary_sources_tool/Archive/201... [5] https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Semi-automatic_A... [6] https://www.cs.ubc.ca/~murphyk/Papers/kv-kdd14.pdf
Hi Marco,
I think, I looked at it some years ago and it still sounds like less than 5% made it, which is what I remember.
-- Sebastian
On 27.09.19 15:53, Marco Fossati wrote:
Hey Sebastian,
On 9/20/19 10:22 AM, Sebastian Hellmann wrote:
Not much of Freebase did end up in Wikidata.
Dropping here some pointers to shed light on the migration of Freebase to Wikidata, since I was partially involved in the process:
- WikiProject [1];
- the paper behind [2];
- datasets to be migrated [3].
I can confirm that the migration has stalled: as of today, *528 thousands* Freebase statements were curated by the community, out of *10 million* ones. By 'curated', I mean approved or rejected. These numbers come from two queries against the primary sources tool database.
The stall is due to several causes: in my opinion, the most important one was the bad quality of sources [4,5] coming from the Knowledge Vault project [6].
Cheers,
Marco
[1] https://www.wikidata.org/wiki/Wikidata:WikiProject_Freebase [2] http://static.googleusercontent.com/media/research.google.com/en//pubs/archi... [3] https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool/Version_1#Data [4] https://www.wikidata.org/wiki/Wikidata_talk:Primary_sources_tool/Archive/201... [5] https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Semi-automatic_A... [6] https://www.cs.ubc.ca/~murphyk/Papers/kv-kdd14.pdf
Hoi, I totally reject the assertion was so bad. I have always had the opinion that the main issue was an atrocious user interface. Add to this the people that have Wikipedia notions about quality. They have and had a detrimental effect on both the quantity and quality of Wikidata.
When you add the functionality that is being build by the datawranglers at DBpedia, it becomes easy/easier to compare the data from Wikipedias with Wikidata (and why not Freebase) add what has consensus and curate the differences. This will enable a true datasense of quality and allows us to provide a much improved service. Thanks, GerardM
On Fri, 27 Sep 2019 at 15:54, Marco Fossati fossati@spaziodati.eu wrote:
Hey Sebastian,
On 9/20/19 10:22 AM, Sebastian Hellmann wrote:
Not much of Freebase did end up in Wikidata.
Dropping here some pointers to shed light on the migration of Freebase to Wikidata, since I was partially involved in the process:
- WikiProject [1];
- the paper behind [2];
- datasets to be migrated [3].
I can confirm that the migration has stalled: as of today, *528 thousands* Freebase statements were curated by the community, out of *10 million* ones. By 'curated', I mean approved or rejected. These numbers come from two queries against the primary sources tool database.
The stall is due to several causes: in my opinion, the most important one was the bad quality of sources [4,5] coming from the Knowledge Vault project [6].
Cheers,
Marco
[1] https://www.wikidata.org/wiki/Wikidata:WikiProject_Freebase [2]
http://static.googleusercontent.com/media/research.google.com/en//pubs/archi... [3] https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool/Version_1#Data [4]
https://www.wikidata.org/wiki/Wikidata_talk:Primary_sources_tool/Archive/201... [5]
https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Semi-automatic_A... [6] https://www.cs.ubc.ca/~murphyk/Papers/kv-kdd14.pdf
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hi Gerard,
I was not trying to judge here. I was just saying that it wasn't much data in the end. For me Freebase was basically cherry-picked.
Meanwhile, the data we extract is more pertinent to the goal of having Wikidata cover the info boxes. We still have ~ 500 million statements left. But none of it is used yet. Hopefully we can change that.
Meanwhile, Google crawls all the references and extracts facts from there. We don't have that available, but there is Linked Open Data.
-- Sebastian
On September 27, 2019 5:26:43 PM GMT+02:00, Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi, I totally reject the assertion was so bad. I have always had the opinion that the main issue was an atrocious user interface. Add to this the people that have Wikipedia notions about quality. They have and had a detrimental effect on both the quantity and quality of Wikidata.
When you add the functionality that is being build by the datawranglers at DBpedia, it becomes easy/easier to compare the data from Wikipedias with Wikidata (and why not Freebase) add what has consensus and curate the differences. This will enable a true datasense of quality and allows us to provide a much improved service. Thanks, GerardM
On Fri, 27 Sep 2019 at 15:54, Marco Fossati fossati@spaziodati.eu wrote:
Hey Sebastian,
On 9/20/19 10:22 AM, Sebastian Hellmann wrote:
Not much of Freebase did end up in Wikidata.
Dropping here some pointers to shed light on the migration of
Freebase
to Wikidata, since I was partially involved in the process:
- WikiProject [1];
- the paper behind [2];
- datasets to be migrated [3].
I can confirm that the migration has stalled: as of today, *528 thousands* Freebase statements were curated by the community, out of
*10
million* ones. By 'curated', I mean approved or rejected. These numbers come from two queries against the primary sources tool database.
The stall is due to several causes: in my opinion, the most important one was the bad quality of sources [4,5] coming from the Knowledge
Vault
project [6].
Cheers,
Marco
[1] https://www.wikidata.org/wiki/Wikidata:WikiProject_Freebase [2]
http://static.googleusercontent.com/media/research.google.com/en//pubs/archi...
[3]
https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool/Version_1#Data
[4]
https://www.wikidata.org/wiki/Wikidata_talk:Primary_sources_tool/Archive/201...
[5]
https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Semi-automatic_A...
[6] https://www.cs.ubc.ca/~murphyk/Papers/kv-kdd14.pdf
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hi all,
as promised, now that I am back from my trip, here's my draft of the comparison of Wikidata, DBpedia, and Freebase.
It is a draft, it is obviously potentially biased given my background, etc., but I hope that we can work on it together to get it into a good shape.
Markus, amusingly I took pretty much the same example that you went for, the parent predicate. So yes, I was also surprised by the results, and would love to have Sebastian or Kingsley look into it and see if I conducted it fairly.
SJ, Andra, thanks for offering to take a look. I am sure you all can contribute your own unique background and make suggestions on how to improve things and whether the results ring true.
Marco, I totally agree with what you said - the project has stalled, and there is plenty of opportunity to harvest more data from Freebase and bring it to Wikidata, and this should be reignited. Sebastian, I also agree with you, and the numbers do so too, the same is true with the extraction results from DBpedia.
Sebastian, Kingsley, I tried to describe how I understand DBpedia, and all steps should be reproducible. As it seems that the two of you also have to discuss one or the other thing about DBpedia's identity, I am relieved that my confusion is not entirely unjustified. So I tried to use both the last stable DBpedia release as well as a new-style DBpedia fusion dataset for the comparison. But I might have gotten the whole procedure wrong. I am happy to be corrected.
On Sat, Sep 28, 2019 at 12:28 AM hellmann@informatik.uni-leipzig.de wrote:
Meanwhile, Google crawls all the references and extracts facts from
there. We don't
have that available, but there is Linked Open Data.
Potentially, not a bad idea, but we don't do that.
Everyone, this is the first time I share a Colab notebook, and I have no idea if I did it right. So any feedback of the form "oh you didn't switch on that bit over here" or "yes, this works, thank you" is very welcome, because I have no clue what I am doing :) Also, I never did this kind of analysis so transparently, which is kinda both totally cool and rather scary, because now you can all see how dumb I am :)
So everyone is invited to send Pull Requests (I guess that's how this works?), and I would love for us to create a result together that we agree on. I see the result of this exercise to be potentially twofold:
1) a publication we can point people to who ask about the differences between Wikidata, DBpedia, and Freebase
2) to reignite or start projects and processes to reduce these differences
So, here is the link to my Colab notebook:
https://github.com/vrandezo/colabs/blob/master/Comparing_coverage_and_accura...
Ideally, the third goal could be to get to a deeper understanding of how these three projects relate to each other - in my point of view, Freebase is dead and outdated, Wikidata is the core knowledge base that anyone can edit, and DBpedia is the core project to weave value-adding workflows on top of Wikidata or other datasets from the linked open data cloud together. But that's just a proposal.
Cheers, Denny
On Sat, Sep 28, 2019 at 12:28 AM hellmann@informatik.uni-leipzig.de wrote:
Hi Gerard,
I was not trying to judge here. I was just saying that it wasn't much data in the end. For me Freebase was basically cherry-picked.
Meanwhile, the data we extract is more pertinent to the goal of having Wikidata cover the info boxes. We still have ~ 500 million statements left. But none of it is used yet. Hopefully we can change that.
Meanwhile, Google crawls all the references and extracts facts from there. We don't have that available, but there is Linked Open Data.
-- Sebastian
On September 27, 2019 5:26:43 PM GMT+02:00, Gerard Meijssen < gerard.meijssen@gmail.com> wrote:
Hoi, I totally reject the assertion was so bad. I have always had the opinion that the main issue was an atrocious user interface. Add to this the people that have Wikipedia notions about quality. They have and had a detrimental effect on both the quantity and quality of Wikidata.
When you add the functionality that is being build by the datawranglers at DBpedia, it becomes easy/easier to compare the data from Wikipedias with Wikidata (and why not Freebase) add what has consensus and curate the differences. This will enable a true datasense of quality and allows us to provide a much improved service. Thanks, GerardM
On Fri, 27 Sep 2019 at 15:54, Marco Fossati fossati@spaziodati.eu wrote:
Hey Sebastian,
On 9/20/19 10:22 AM, Sebastian Hellmann wrote:
Not much of Freebase did end up in Wikidata.
Dropping here some pointers to shed light on the migration of Freebase to Wikidata, since I was partially involved in the process:
- WikiProject [1];
- the paper behind [2];
- datasets to be migrated [3].
I can confirm that the migration has stalled: as of today, *528 thousands* Freebase statements were curated by the community, out of *10 million* ones. By 'curated', I mean approved or rejected. These numbers come from two queries against the primary sources tool database.
The stall is due to several causes: in my opinion, the most important one was the bad quality of sources [4,5] coming from the Knowledge Vault project [6].
Cheers,
Marco
[1] https://www.wikidata.org/wiki/Wikidata:WikiProject_Freebase [2]
http://static.googleusercontent.com/media/research.google.com/en//pubs/archi... [3]
https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool/Version_1#Data [4]
https://www.wikidata.org/wiki/Wikidata_talk:Primary_sources_tool/Archive/201... [5]
https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Semi-automatic_A... [6] https://www.cs.ubc.ca/~murphyk/Papers/kv-kdd14.pdf
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
-- Sent from my Android device with K-9 Mail. Please excuse my brevity. _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hi Denny,
Thanks for publishing your Colab notebook! I went through it and would like to share my first thoughts here. We can then move further discussion somewhere else.
1. in general, how can we compare datasets with totally different time stamps? Wikidata is alive, Freebase is dead, and the latest DBpedia dump is old; 2. given that all datasets contain Wikipedia links, perhaps we could use them as a bridge for the comparison, instead of Wikidata mappings. I'm assuming that Freebase and DBpedia entities with Wikidata mappings are subsets of the whole datasets (but this should be verified); 3. we could use record linkage techniques to connect Wikidata entities with Freebase and DBpedia ones, then assess the agreement in terms of statements per entity. There has been some experimental work (different use case and goal) in the soweego project: https://soweego.readthedocs.io/en/latest/validator.html
On 10/1/19 1:13 AM, Denny Vrandečić wrote:
Marco, I totally agree with what you said - the project has stalled, and there is plenty of opportunity to harvest more data from Freebase and bring it to Wikidata, and this should be reignited.
Yeah, that would be great. There is known work to do, but it's hard to sustain such a big project without allocated resources: https://phabricator.wikimedia.org/maniphest/query/CPiqkafGs5G./#R
BTW, there is also version 2 of the Wikidata primary sources tool that needs love, although I'm now skeptical that it will be an effective way to achieve the Freebase harvesting. We should probably rethink the whole thing, and restart small with very simple use cases, pretty much like the Harvest templates tool you mentioned: https://tools.wmflabs.org/pltools/harvesttemplates/
Cheers,
Marco
P.S.: I *might* have found the freshest relevant DBpedia datasets: https://databus.dbpedia.org/dbpedia/mappings/mappingbased-objects I said *might* because it was really painful to find a download button and to guess among multiple versions of the same dataset: https://downloads.dbpedia.org/repo/lts/mappings/mappingbased-objects/2019.09... @Sebastian may know if it's the good one :-)
Hi Marco,
On October 1, 2019 11:48:02 PM GMT+02:00, Marco Fossati fossati@spaziodati.eu wrote:
Hi Denny,
Thanks for publishing your Colab notebook! I went through it and would like to share my first thoughts here. We can then move further discussion somewhere else.
- in general, how can we compare datasets with totally different time
stamps? Wikidata is alive, Freebase is dead, and the latest DBpedia dump is old;
DBpedia made monthly releases for the past three months which will continue to improve and grow in an agile Manne, we focused on debugging and integration. Max age would be 30 days. I think that is OK. Denny validated against the live endpoint. This is OK to drive growth, but not reproducible scientifically compared to dumps.
- given that all datasets contain Wikipedia links, perhaps we could
use them as a bridge for the comparison, instead of Wikidata mappings. I'm assuming that Freebase and DBpedia entities with Wikidata mappings are subsets of the whole datasets (but this should be verified); 3. we could use record linkage techniques to connect Wikidata entities with Freebase and DBpedia ones, then assess the agreement in terms of statements per entity. There has been some experimental work (different
use case and goal) in the soweego project: https://soweego.readthedocs.io/en/latest/validator.html
On 10/1/19 1:13 AM, Denny Vrandečić wrote:
Marco, I totally agree with what you said - the project has stalled,
and
there is plenty of opportunity to harvest more data from Freebase and
bring it to Wikidata, and this should be reignited.
Yeah, that would be great. There is known work to do, but it's hard to sustain such a big project without allocated resources: https://phabricator.wikimedia.org/maniphest/query/CPiqkafGs5G./#R
BTW, there is also version 2 of the Wikidata primary sources tool that needs love, although I'm now skeptical that it will be an effective way
to achieve the Freebase harvesting. We should probably rethink the whole thing, and restart small with very
simple use cases, pretty much like the Harvest templates tool you mentioned: https://tools.wmflabs.org/pltools/harvesttemplates/
Cheers,
Marco
P.S.: I *might* have found the freshest relevant DBpedia datasets: https://databus.dbpedia.org/dbpedia/mappings/mappingbased-objects I said *might* because it was really painful to find a download button and to guess among multiple versions of the same dataset: https://downloads.dbpedia.org/repo/lts/mappings/mappingbased-objects/2019.09... @Sebastian may know if it's the good one :-)
Hoi, As indicated by the DBpedia people, there are two ways in which data gets into their latest Fusion offering. There is consensus, all the available sources agree and, there is the notion where one source is deemed authoritative. Remember, DBpedia uses sources outside of the Wikimedia movement like national libraries !!
What I miss in your paper is purpose, what is the way forward and how does it compare with and improve on current practice. Current practice is that people import data from anywhere, typically it is single sourced if at all and including is introduced human error that is inherent in a manual process. The DBpedia folks have a WMF sponsored project whereby they facilitate the inclusion of data to Wikidata. Particularly where there is consensus (no opposing sources) it is an improvement on current practice, it complements nicely the existing Wikidata content. The content where there is NO consensus, is useful because it enables the highlighting where these errors occur. It will really help in finding false friends.
The Freebase data has been abandoned. It did not get the respect it deserved and particularly at the time its quality was better than Wikidata. The fact that it is dated IS a saving grace because Wikidata/ Wikipedia is particularly strong on the content related to the period of Wikipedia activity. My preferred way of treating the Freebase data is fusing it is the Fusion project. All the data that is new or expands on what is known in Fusion is of relevance. Given that no maintenance is done on the Freebase data, the dissenting data at best can be used for curating what is in the WMF projects.
In your paper you support the notion of harvesting based on single sources. Maybe at a later date. First we need to integrate the uncontroversial data, the data where there is a consensus in multiple projects. The biggest benefit will be that a lot of make work is prevented. Work done because the data just did not get into Wikidata. Thanks, GerardM
On Tue, 1 Oct 2019 at 01:14, Denny Vrandečić vrandecic@google.com wrote:
Hi all,
as promised, now that I am back from my trip, here's my draft of the comparison of Wikidata, DBpedia, and Freebase.
It is a draft, it is obviously potentially biased given my background, etc., but I hope that we can work on it together to get it into a good shape.
Markus, amusingly I took pretty much the same example that you went for, the parent predicate. So yes, I was also surprised by the results, and would love to have Sebastian or Kingsley look into it and see if I conducted it fairly.
SJ, Andra, thanks for offering to take a look. I am sure you all can contribute your own unique background and make suggestions on how to improve things and whether the results ring true.
Marco, I totally agree with what you said - the project has stalled, and there is plenty of opportunity to harvest more data from Freebase and bring it to Wikidata, and this should be reignited. Sebastian, I also agree with you, and the numbers do so too, the same is true with the extraction results from DBpedia.
Sebastian, Kingsley, I tried to describe how I understand DBpedia, and all steps should be reproducible. As it seems that the two of you also have to discuss one or the other thing about DBpedia's identity, I am relieved that my confusion is not entirely unjustified. So I tried to use both the last stable DBpedia release as well as a new-style DBpedia fusion dataset for the comparison. But I might have gotten the whole procedure wrong. I am happy to be corrected.
On Sat, Sep 28, 2019 at 12:28 AM hellmann@informatik.uni-leipzig.de wrote:
Meanwhile, Google crawls all the references and extracts facts from
there. We don't
have that available, but there is Linked Open Data.
Potentially, not a bad idea, but we don't do that.
Everyone, this is the first time I share a Colab notebook, and I have no idea if I did it right. So any feedback of the form "oh you didn't switch on that bit over here" or "yes, this works, thank you" is very welcome, because I have no clue what I am doing :) Also, I never did this kind of analysis so transparently, which is kinda both totally cool and rather scary, because now you can all see how dumb I am :)
So everyone is invited to send Pull Requests (I guess that's how this works?), and I would love for us to create a result together that we agree on. I see the result of this exercise to be potentially twofold:
- a publication we can point people to who ask about the differences
between Wikidata, DBpedia, and Freebase
- to reignite or start projects and processes to reduce these differences
So, here is the link to my Colab notebook:
https://github.com/vrandezo/colabs/blob/master/Comparing_coverage_and_accura...
Ideally, the third goal could be to get to a deeper understanding of how these three projects relate to each other - in my point of view, Freebase is dead and outdated, Wikidata is the core knowledge base that anyone can edit, and DBpedia is the core project to weave value-adding workflows on top of Wikidata or other datasets from the linked open data cloud together. But that's just a proposal.
Cheers, Denny
On Sat, Sep 28, 2019 at 12:28 AM hellmann@informatik.uni-leipzig.de wrote:
Hi Gerard,
I was not trying to judge here. I was just saying that it wasn't much data in the end. For me Freebase was basically cherry-picked.
Meanwhile, the data we extract is more pertinent to the goal of having Wikidata cover the info boxes. We still have ~ 500 million statements left. But none of it is used yet. Hopefully we can change that.
Meanwhile, Google crawls all the references and extracts facts from there. We don't have that available, but there is Linked Open Data.
-- Sebastian
On September 27, 2019 5:26:43 PM GMT+02:00, Gerard Meijssen < gerard.meijssen@gmail.com> wrote:
Hoi, I totally reject the assertion was so bad. I have always had the opinion that the main issue was an atrocious user interface. Add to this the people that have Wikipedia notions about quality. They have and had a detrimental effect on both the quantity and quality of Wikidata.
When you add the functionality that is being build by the datawranglers at DBpedia, it becomes easy/easier to compare the data from Wikipedias with Wikidata (and why not Freebase) add what has consensus and curate the differences. This will enable a true datasense of quality and allows us to provide a much improved service. Thanks, GerardM
On Fri, 27 Sep 2019 at 15:54, Marco Fossati fossati@spaziodati.eu wrote:
Hey Sebastian,
On 9/20/19 10:22 AM, Sebastian Hellmann wrote:
Not much of Freebase did end up in Wikidata.
Dropping here some pointers to shed light on the migration of Freebase to Wikidata, since I was partially involved in the process:
- WikiProject [1];
- the paper behind [2];
- datasets to be migrated [3].
I can confirm that the migration has stalled: as of today, *528 thousands* Freebase statements were curated by the community, out of *10 million* ones. By 'curated', I mean approved or rejected. These numbers come from two queries against the primary sources tool database.
The stall is due to several causes: in my opinion, the most important one was the bad quality of sources [4,5] coming from the Knowledge Vault project [6].
Cheers,
Marco
[1] https://www.wikidata.org/wiki/Wikidata:WikiProject_Freebase [2]
http://static.googleusercontent.com/media/research.google.com/en//pubs/archi... [3]
https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool/Version_1#Data [4]
https://www.wikidata.org/wiki/Wikidata_talk:Primary_sources_tool/Archive/201... [5]
https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Semi-automatic_A... [6] https://www.cs.ubc.ca/~murphyk/Papers/kv-kdd14.pdf
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
-- Sent from my Android device with K-9 Mail. Please excuse my brevity. _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hi Denny,
here are some initial points:
1. there is also the generic dataset from last month: https://databus.dbpedia.org/dbpedia/generic/infobox-properties/2019.08.30 dataset (We still need to copy the docu on the bus). This has the highest coverage, but lowest consistency. English has around 50k parent properties maybe more if you count child inverse and other variants. We would need to check the mappings at http://mappings.dbpedia.org , which we are doing at the moment anyhow. It could take only an hour to map some healthy chunks into the mappings dataset.
curl https://downloads.dbpedia.org/repo/lts/generic/infobox-properties/2019.08.30... | bzcat | grep "/parent"
http://temporary.dbpedia.org/temporary/parentrel.nt.bz2
Normally this dataset is messy, but still quite useful, because you can write the queries with alternatives (see dbo:position|dbp:position) in a way that make them useable, like this query that works since 13 years:
soccer players, who are born in a country with more than 10 million inhabitants, who played as goalkeeper for a club that has a stadium with more than 30.000 seats and the club country is different from the birth country http://dbpedia.org/snorql/?query=SELECT+distinct+%3Fsoccerplayer+%3FcountryOfBirth+%3Fteam+%3FcountryOfTeam+%3Fstadiumcapacity%0D%0A{+%0D%0A%3Fsoccerplayer+a+dbo%3ASoccerPlayer+%3B%0D%0A+++dbo%3Aposition|dbp%3Aposition+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FGoalkeeper_%28association_football%29%3E+%3B%0D%0A+++dbo%3AbirthPlace%2Fdbo%3Acountry*+%3FcountryOfBirth+%3B%0D%0A+++%23dbo%3Anumber+13+%3B%0D%0A+++dbo%3Ateam+%3Fteam+.%0D%0A+++%3Fteam+dbo%3Acapacity+%3Fstadiumcapacity+%3B+dbo%3Aground+%3FcountryOfTeam+.+%0D%0A+++%3FcountryOfBirth+a+dbo%3ACountry+%3B+dbo%3ApopulationTotal+%3Fpopulation+.%0D%0A+++%3FcountryOfTeam+a+dbo%3ACountry+.%0D%0AFILTER+%28%3FcountryOfTeam+!%3D+%3FcountryOfBirth%29%0D%0AFILTER+%28%3Fstadiumcapacity+%3E+30000%29%0D%0AFILTER+%28%3Fpopulation+%3E+10000000%29%0D%0A}+order+by+%3Fsoccerplayer
Maybe, we could also evaluate some queries which can be answered by one or the other? Can you do the query above in Wikidata?
2. We also have an API to get all references from infoboxes now as a partial result of the GFS project . See point 5 here : https://meta.wikimedia.org/wiki/Grants:Project/DBpedia/GlobalFactSyncRE
3. This particular dataset (generic/infobox-properties) above is also a good measure of non-adoption of Wikidata in Wikipedia. In total, it has over 500 million statements for all languages. Having a statement here means, that the data is using an infobox template parameter and no wikidata is used. The dataset is still extracted in the same way. We can check whether it got bigger or smaller. It is the same algorithm. But the fact that this still works and has a decent size indicates that Wikidata adoption by Wikipedians is low.
4. I need to look at the parent example in detail. However, I have to say that the property lends itself well for the Wikidata approach since it is easily understood and has sort of a truthiness and is easy to research and add.
I am not sure if it is representative as e.g. "employer" is more difficult to model (time scoped). Like my data here is outdated: https://www.wikidata.org/wiki/Q39429171
Also I don't see yet how this will become a more systematic approach that shows where to optimize, but I still need to read it fully.
We can start with this one however.
-- Sebastian
On 01.10.19 01:13, Denny Vrandečić wrote:
Hi all,
as promised, now that I am back from my trip, here's my draft of the comparison of Wikidata, DBpedia, and Freebase.
It is a draft, it is obviously potentially biased given my background, etc., but I hope that we can work on it together to get it into a good shape.
Markus, amusingly I took pretty much the same example that you went for, the parent predicate. So yes, I was also surprised by the results, and would love to have Sebastian or Kingsley look into it and see if I conducted it fairly.
SJ, Andra, thanks for offering to take a look. I am sure you all can contribute your own unique background and make suggestions on how to improve things and whether the results ring true.
Marco, I totally agree with what you said - the project has stalled, and there is plenty of opportunity to harvest more data from Freebase and bring it to Wikidata, and this should be reignited. Sebastian, I also agree with you, and the numbers do so too, the same is true with the extraction results from DBpedia.
Sebastian, Kingsley, I tried to describe how I understand DBpedia, and all steps should be reproducible. As it seems that the two of you also have to discuss one or the other thing about DBpedia's identity, I am relieved that my confusion is not entirely unjustified. So I tried to use both the last stable DBpedia release as well as a new-style DBpedia fusion dataset for the comparison. But I might have gotten the whole procedure wrong. I am happy to be corrected.
On Sat, Sep 28, 2019 at 12:28 AM <hellmann@informatik.uni-leipzig.de mailto:hellmann@informatik.uni-leipzig.de> wrote:
Meanwhile, Google crawls all the references and extracts facts from
there. We don't
have that available, but there is Linked Open Data.
Potentially, not a bad idea, but we don't do that.
Everyone, this is the first time I share a Colab notebook, and I have no idea if I did it right. So any feedback of the form "oh you didn't switch on that bit over here" or "yes, this works, thank you" is very welcome, because I have no clue what I am doing :) Also, I never did this kind of analysis so transparently, which is kinda both totally cool and rather scary, because now you can all see how dumb I am :)
So everyone is invited to send Pull Requests (I guess that's how this works?), and I would love for us to create a result together that we agree on. I see the result of this exercise to be potentially twofold:
- a publication we can point people to who ask about the differences
between Wikidata, DBpedia, and Freebase
- to reignite or start projects and processes to reduce these differences
So, here is the link to my Colab notebook:
https://github.com/vrandezo/colabs/blob/master/Comparing_coverage_and_accura...
Ideally, the third goal could be to get to a deeper understanding of how these three projects relate to each other - in my point of view, Freebase is dead and outdated, Wikidata is the core knowledge base that anyone can edit, and DBpedia is the core project to weave value-adding workflows on top of Wikidata or other datasets from the linked open data cloud together. But that's just a proposal.
Cheers, Denny
On Sat, Sep 28, 2019 at 12:28 AM <hellmann@informatik.uni-leipzig.de mailto:hellmann@informatik.uni-leipzig.de> wrote:
Hi Gerard, I was not trying to judge here. I was just saying that it wasn't much data in the end. For me Freebase was basically cherry-picked. Meanwhile, the data we extract is more pertinent to the goal of having Wikidata cover the info boxes. We still have ~ 500 million statements left. But none of it is used yet. Hopefully we can change that. Meanwhile, Google crawls all the references and extracts facts from there. We don't have that available, but there is Linked Open Data. -- Sebastian On September 27, 2019 5:26:43 PM GMT+02:00, Gerard Meijssen <gerard.meijssen@gmail.com <mailto:gerard.meijssen@gmail.com>> wrote: Hoi, I totally reject the assertion was so bad. I have always had the opinion that the main issue was an atrocious user interface. Add to this the people that have Wikipedia notions about quality. They have and had a detrimental effect on both the quantity and quality of Wikidata. When you add the functionality that is being build by the datawranglers at DBpedia, it becomes easy/easier to compare the data from Wikipedias with Wikidata (and why not Freebase) add what has consensus and curate the differences. This will enable a true datasense of quality and allows us to provide a much improved service. Thanks, GerardM On Fri, 27 Sep 2019 at 15:54, Marco Fossati <fossati@spaziodati.eu <mailto:fossati@spaziodati.eu>> wrote: Hey Sebastian, On 9/20/19 10:22 AM, Sebastian Hellmann wrote: > Not much of Freebase did end up in Wikidata. Dropping here some pointers to shed light on the migration of Freebase to Wikidata, since I was partially involved in the process: 1. WikiProject [1]; 2. the paper behind [2]; 3. datasets to be migrated [3]. I can confirm that the migration has stalled: as of today, *528 thousands* Freebase statements were curated by the community, out of *10 million* ones. By 'curated', I mean approved or rejected. These numbers come from two queries against the primary sources tool database. The stall is due to several causes: in my opinion, the most important one was the bad quality of sources [4,5] coming from the Knowledge Vault project [6]. Cheers, Marco [1] https://www.wikidata.org/wiki/Wikidata:WikiProject_Freebase [2] http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/44818.pdf [3] https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool/Version_1#Data [4] https://www.wikidata.org/wiki/Wikidata_talk:Primary_sources_tool/Archive/2017#Quality_of_sources [5] https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Semi-automatic_Addition_of_References_to_Wikidata_Statements#A_whitelist_for_sources [6] https://www.cs.ubc.ca/~murphyk/Papers/kv-kdd14.pdf _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata -- Sent from my Android device with K-9 Mail. Please excuse my brevity. _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hi Denny, all,
here is the second prototype of the new overarching DBpedia approach:
https://databus.dbpedia.org/vehnem/flexifusion/prefusion/2019.11.01
Datasets are grouped by property, DBpedia ontology is used, if exists. Data contains all Wkipedia languages mapped via DBpedia, Wikidata where mapped, some properties from DNB, Musicbrainz, Geonames.
We normalized the subjects based on the sameas links with some quality control. Datatypes will be normalised by rules plus machine learning in the future.
As soon as we make some adjustments, we can load it into the GFS GUI.
We are also working on an export using Wikidata Q's and P's so it is easier to ingest into Wikidata. More datasets from LOD will follow.
All the best,
Sebastian
On 04.10.19 01:23, Sebastian Hellmann wrote:
Hi Denny,
here are some initial points:
- there is also the generic dataset from last month:
https://databus.dbpedia.org/dbpedia/generic/infobox-properties/2019.08.30 dataset (We still need to copy the docu on the bus). This has the highest coverage, but lowest consistency. English has around 50k parent properties maybe more if you count child inverse and other variants. We would need to check the mappings at http://mappings.dbpedia.org , which we are doing at the moment anyhow. It could take only an hour to map some healthy chunks into the mappings dataset.
curl https://downloads.dbpedia.org/repo/lts/generic/infobox-properties/2019.08.30... | bzcat | grep "/parent"
http://temporary.dbpedia.org/temporary/parentrel.nt.bz2
Normally this dataset is messy, but still quite useful, because you can write the queries with alternatives (see dbo:position|dbp:position) in a way that make them useable, like this query that works since 13 years:
soccer players, who are born in a country with more than 10 million inhabitants, who played as goalkeeper for a club that has a stadium with more than 30.000 seats and the club country is different from the birth country http://dbpedia.org/snorql/?query=SELECT+distinct+%3Fsoccerplayer+%3FcountryOfBirth+%3Fteam+%3FcountryOfTeam+%3Fstadiumcapacity%0D%0A{+%0D%0A%3Fsoccerplayer+a+dbo%3ASoccerPlayer+%3B%0D%0A+++dbo%3Aposition|dbp%3Aposition+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FGoalkeeper_%28association_football%29%3E+%3B%0D%0A+++dbo%3AbirthPlace%2Fdbo%3Acountry*+%3FcountryOfBirth+%3B%0D%0A+++%23dbo%3Anumber+13+%3B%0D%0A+++dbo%3Ateam+%3Fteam+.%0D%0A+++%3Fteam+dbo%3Acapacity+%3Fstadiumcapacity+%3B+dbo%3Aground+%3FcountryOfTeam+.+%0D%0A+++%3FcountryOfBirth+a+dbo%3ACountry+%3B+dbo%3ApopulationTotal+%3Fpopulation+.%0D%0A+++%3FcountryOfTeam+a+dbo%3ACountry+.%0D%0AFILTER+%28%3FcountryOfTeam+!%3D+%3FcountryOfBirth%29%0D%0AFILTER+%28%3Fstadiumcapacity+%3E+30000%29%0D%0AFILTER+%28%3Fpopulation+%3E+10000000%29%0D%0A}+order+by+%3Fsoccerplayer
Maybe, we could also evaluate some queries which can be answered by one or the other? Can you do the query above in Wikidata?
- We also have an API to get all references from infoboxes now as a
partial result of the GFS project . See point 5 here : https://meta.wikimedia.org/wiki/Grants:Project/DBpedia/GlobalFactSyncRE
- This particular dataset (generic/infobox-properties) above is also
a good measure of non-adoption of Wikidata in Wikipedia. In total, it has over 500 million statements for all languages. Having a statement here means, that the data is using an infobox template parameter and no wikidata is used. The dataset is still extracted in the same way. We can check whether it got bigger or smaller. It is the same algorithm. But the fact that this still works and has a decent size indicates that Wikidata adoption by Wikipedians is low.
- I need to look at the parent example in detail. However, I have to
say that the property lends itself well for the Wikidata approach since it is easily understood and has sort of a truthiness and is easy to research and add.
I am not sure if it is representative as e.g. "employer" is more difficult to model (time scoped). Like my data here is outdated: https://www.wikidata.org/wiki/Q39429171
Also I don't see yet how this will become a more systematic approach that shows where to optimize, but I still need to read it fully.
We can start with this one however.
-- Sebastian
On 01.10.19 01:13, Denny Vrandečić wrote:
Hi all,
as promised, now that I am back from my trip, here's my draft of the comparison of Wikidata, DBpedia, and Freebase.
It is a draft, it is obviously potentially biased given my background, etc., but I hope that we can work on it together to get it into a good shape.
Markus, amusingly I took pretty much the same example that you went for, the parent predicate. So yes, I was also surprised by the results, and would love to have Sebastian or Kingsley look into it and see if I conducted it fairly.
SJ, Andra, thanks for offering to take a look. I am sure you all can contribute your own unique background and make suggestions on how to improve things and whether the results ring true.
Marco, I totally agree with what you said - the project has stalled, and there is plenty of opportunity to harvest more data from Freebase and bring it to Wikidata, and this should be reignited. Sebastian, I also agree with you, and the numbers do so too, the same is true with the extraction results from DBpedia.
Sebastian, Kingsley, I tried to describe how I understand DBpedia, and all steps should be reproducible. As it seems that the two of you also have to discuss one or the other thing about DBpedia's identity, I am relieved that my confusion is not entirely unjustified. So I tried to use both the last stable DBpedia release as well as a new-style DBpedia fusion dataset for the comparison. But I might have gotten the whole procedure wrong. I am happy to be corrected.
On Sat, Sep 28, 2019 at 12:28 AM <hellmann@informatik.uni-leipzig.de mailto:hellmann@informatik.uni-leipzig.de> wrote:
Meanwhile, Google crawls all the references and extracts facts from
there. We don't
have that available, but there is Linked Open Data.
Potentially, not a bad idea, but we don't do that.
Everyone, this is the first time I share a Colab notebook, and I have no idea if I did it right. So any feedback of the form "oh you didn't switch on that bit over here" or "yes, this works, thank you" is very welcome, because I have no clue what I am doing :) Also, I never did this kind of analysis so transparently, which is kinda both totally cool and rather scary, because now you can all see how dumb I am :)
So everyone is invited to send Pull Requests (I guess that's how this works?), and I would love for us to create a result together that we agree on. I see the result of this exercise to be potentially twofold:
- a publication we can point people to who ask about the differences
between Wikidata, DBpedia, and Freebase
- to reignite or start projects and processes to reduce these
differences
So, here is the link to my Colab notebook:
https://github.com/vrandezo/colabs/blob/master/Comparing_coverage_and_accura...
Ideally, the third goal could be to get to a deeper understanding of how these three projects relate to each other - in my point of view, Freebase is dead and outdated, Wikidata is the core knowledge base that anyone can edit, and DBpedia is the core project to weave value-adding workflows on top of Wikidata or other datasets from the linked open data cloud together. But that's just a proposal.
Cheers, Denny
On Sat, Sep 28, 2019 at 12:28 AM <hellmann@informatik.uni-leipzig.de mailto:hellmann@informatik.uni-leipzig.de> wrote:
Hi Gerard, I was not trying to judge here. I was just saying that it wasn't much data in the end. For me Freebase was basically cherry-picked. Meanwhile, the data we extract is more pertinent to the goal of having Wikidata cover the info boxes. We still have ~ 500 million statements left. But none of it is used yet. Hopefully we can change that. Meanwhile, Google crawls all the references and extracts facts from there. We don't have that available, but there is Linked Open Data. -- Sebastian On September 27, 2019 5:26:43 PM GMT+02:00, Gerard Meijssen <gerard.meijssen@gmail.com <mailto:gerard.meijssen@gmail.com>> wrote: Hoi, I totally reject the assertion was so bad. I have always had the opinion that the main issue was an atrocious user interface. Add to this the people that have Wikipedia notions about quality. They have and had a detrimental effect on both the quantity and quality of Wikidata. When you add the functionality that is being build by the datawranglers at DBpedia, it becomes easy/easier to compare the data from Wikipedias with Wikidata (and why not Freebase) add what has consensus and curate the differences. This will enable a true datasense of quality and allows us to provide a much improved service. Thanks, GerardM On Fri, 27 Sep 2019 at 15:54, Marco Fossati <fossati@spaziodati.eu <mailto:fossati@spaziodati.eu>> wrote: Hey Sebastian, On 9/20/19 10:22 AM, Sebastian Hellmann wrote: > Not much of Freebase did end up in Wikidata. Dropping here some pointers to shed light on the migration of Freebase to Wikidata, since I was partially involved in the process: 1. WikiProject [1]; 2. the paper behind [2]; 3. datasets to be migrated [3]. I can confirm that the migration has stalled: as of today, *528 thousands* Freebase statements were curated by the community, out of *10 million* ones. By 'curated', I mean approved or rejected. These numbers come from two queries against the primary sources tool database. The stall is due to several causes: in my opinion, the most important one was the bad quality of sources [4,5] coming from the Knowledge Vault project [6]. Cheers, Marco [1] https://www.wikidata.org/wiki/Wikidata:WikiProject_Freebase [2] http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/44818.pdf [3] https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool/Version_1#Data [4] https://www.wikidata.org/wiki/Wikidata_talk:Primary_sources_tool/Archive/2017#Quality_of_sources [5] https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Semi-automatic_Addition_of_References_to_Wikidata_Statements#A_whitelist_for_sources [6] https://www.cs.ubc.ca/~murphyk/Papers/kv-kdd14.pdf _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata -- Sent from my Android device with K-9 Mail. Please excuse my brevity. _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
-- All the best, Sebastian Hellmann
Director of Knowledge Integration and Linked Data Technologies (KILT) Competence Center at the Institute for Applied Informatics (InfAI) at Leipzig University Executive Director of the DBpedia Association Projects: http://dbpedia.org, http://nlp2rdf.org, http://linguistics.okfn.org, https://www.w3.org/community/ld4lt http://www.w3.org/community/ld4lt Homepage: http://aksw.org/SebastianHellmann Research Group: http://aksw.org
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata