Hi. I recently started following mediawiki/extensions/Wikibase on Gerrit, and quite astonishingly found that nearly all of the 100 most recently updated changes appear to be owned by WMDE employees (exceptions being one change by Legoktm and some from L10n-bot). This is not the case, for example, with mediawiki/core. While this may be desired by the Wikidata team for corporate reasons, I feel that encouraging code review by volunteers would empower both Wikidata and third-party communities with new ways of contributing to the project and raise awareness of the development team's goals in the long term. The messy naming conventions play a role too, i.e. Extension:Wikibase https://www.mediawiki.org/w/index.php?title=Extension:Wikibase&redirect=no is supposed to host technical documentation but instead redirects to the Wikibase https://www.mediawiki.org/wiki/Wikibase portal, with actual documentation split into Extension:Wikibase Repository https://www.mediawiki.org/wiki/Extension:Wikibase_Repository and Extension:Wikibase Client https://www.mediawiki.org/wiki/Extension:Wikibase_Client, apparently ignoring the fact that the code is actually developed in a single repository (correct me if I'm wrong). Just to add some more confusion, there's also Extension:Wikidata build https://www.mediawiki.org/wiki/Extension:Wikidata_build with no documentation. And what about wmde on GitHub https://github.com/wmde with countless creatively-named repos? They make life even harder for potential contributors. Finally, the ever-changing client-side APIs make gadgets development a pain in the ass. Sorry if this sounds like a slap in the face, but it had to be said.
On Tue, Feb 17, 2015 at 12:43 PM, Ricordisamoa ricordisamoa@openmailbox.org wrote:
Hi. I recently started following mediawiki/extensions/Wikibase on Gerrit, and quite astonishingly found that nearly all of the 100 most recently updated changes appear to be owned by WMDE employees (exceptions being one change by Legoktm and some from L10n-bot). This is not the case, for example, with mediawiki/core. While this may be desired by the Wikidata team for corporate reasons, I feel that encouraging code review by volunteers would empower both Wikidata and third-party communities with new ways of contributing to the project and raise awareness of the development team's goals in the long term.
How would you like to see us encourage this more? It is nothing we actively do not want of course.
The messy naming conventions play a role too, i.e. Extension:Wikibase is supposed to host technical documentation but instead redirects to the Wikibase portal, with actual documentation split into Extension:Wikibase Repository and Extension:Wikibase Client, apparently ignoring the fact that the code is actually developed in a single repository (correct me if I'm wrong). Just to add some more confusion, there's also Extension:Wikidata build with no documentation.
There are different repositories. They just get merged into one for deployment.
And what about wmde on GitHub with countless creatively-named repos? They make life even harder for potential contributors.
Agreed. Something we want to tackle.
Finally, the ever-changing client-side APIs make gadgets development a pain in the ass.
Agreed but as I said this is going to be painful for a little longer until we have done the UI redesign. After that I want it to be more stable again obviously.
Cheers Lydia
Il 17/02/2015 12:53, Lydia Pintscher ha scritto:
On Tue, Feb 17, 2015 at 12:43 PM, Ricordisamoa ricordisamoa@openmailbox.org wrote:
Hi. I recently started following mediawiki/extensions/Wikibase on Gerrit, and quite astonishingly found that nearly all of the 100 most recently updated changes appear to be owned by WMDE employees (exceptions being one change by Legoktm and some from L10n-bot). This is not the case, for example, with mediawiki/core. While this may be desired by the Wikidata team for corporate reasons, I feel that encouraging code review by volunteers would empower both Wikidata and third-party communities with new ways of contributing to the project and raise awareness of the development team's goals in the long term.
How would you like to see us encourage this more? It is nothing we actively do not want of course.
Using a single code review system and a simpler repository structure will indirectly encourage them. I'm now seeing Wikibase/Programmer's guide to Wikibase https://www.mediawiki.org/wiki/Wikibase/Programmer%27s_guide_to_Wikibase, which seems fairly detailed but partly duplicates the Gerrit help pages.
The messy naming conventions play a role too, i.e. Extension:Wikibase is supposed to host technical documentation but instead redirects to the Wikibase portal, with actual documentation split into Extension:Wikibase Repository and Extension:Wikibase Client, apparently ignoring the fact that the code is actually developed in a single repository (correct me if I'm wrong). Just to add some more confusion, there's also Extension:Wikidata build with no documentation.
There are different repositories. They just get merged into one for deployment.
Really? AFAICS development occurs on mediawiki/extensions/Wikibase https://git.wikimedia.org/summary/?r=mediawiki/extensions/Wikibase.git and on GitHub. mediawiki/extensions/WikibaseRepository https://git.wikimedia.org/summary/?r=mediawiki/extensions/WikibaseRepository.git and mediawiki/extensions/WikibaseClient https://git.wikimedia.org/summary/?r=mediawiki/extensions/WikibaseClient.git also exist but have always been empty. Even mediawiki/extensions/WikibaseRepo https://git.wikimedia.org/summary/?r=mediawiki/extensions/WikibaseRepo.git appears to exist according to Gitblit, but not according to Gerrit nor GitHub...
And what about wmde on GitHub with countless creatively-named repos? They make life even harder for potential contributors.
Agreed. Something we want to tackle.
Out of curiosity, was GitHub chosen because it fitted with your workflow? Will you embrace Differential when it comes?
Finally, the ever-changing client-side APIs make gadgets development a pain in the ass.
Agreed but as I said this is going to be painful for a little longer until we have done the UI redesign. After that I want it to be more stable again obviously.
Thanks. Is there a task/page where progress is tracked?
Cheers Lydia
Il 17/02/2015 13:33, Ricordisamoa ha scritto:
Il 17/02/2015 12:53, Lydia Pintscher ha scritto:
On Tue, Feb 17, 2015 at 12:43 PM, Ricordisamoa ricordisamoa@openmailbox.org wrote:
Hi. I recently started following mediawiki/extensions/Wikibase on Gerrit, and quite astonishingly found that nearly all of the 100 most recently updated changes appear to be owned by WMDE employees (exceptions being one change by Legoktm and some from L10n-bot). This is not the case, for example, with mediawiki/core. While this may be desired by the Wikidata team for corporate reasons, I feel that encouraging code review by volunteers would empower both Wikidata and third-party communities with new ways of contributing to the project and raise awareness of the development team's goals in the long term.
How would you like to see us encourage this more? It is nothing we actively do not want of course.
Using a single code review system and a simpler repository structure will indirectly encourage them. I'm now seeing Wikibase/Programmer's guide to Wikibase https://www.mediawiki.org/wiki/Wikibase/Programmer%27s_guide_to_Wikibase, which seems fairly detailed but partly duplicates the Gerrit help pages.
The messy naming conventions play a role too, i.e. Extension:Wikibase is supposed to host technical documentation but instead redirects to the Wikibase portal, with actual documentation split into Extension:Wikibase Repository and Extension:Wikibase Client, apparently ignoring the fact that the code is actually developed in a single repository (correct me if I'm wrong). Just to add some more confusion, there's also Extension:Wikidata build with no documentation.
There are different repositories. They just get merged into one for deployment.
Really? AFAICS development occurs on mediawiki/extensions/Wikibase https://git.wikimedia.org/summary/?r=mediawiki/extensions/Wikibase.git and on GitHub. mediawiki/extensions/WikibaseRepository https://git.wikimedia.org/summary/?r=mediawiki/extensions/WikibaseRepository.git and mediawiki/extensions/WikibaseClient https://git.wikimedia.org/summary/?r=mediawiki/extensions/WikibaseClient.git also exist but have always been empty. Even mediawiki/extensions/WikibaseRepo https://git.wikimedia.org/summary/?r=mediawiki/extensions/WikibaseRepo.git appears to exist according to Gitblit, but not according to Gerrit nor GitHub...
Upd: I found https://github.com/wmde/WikibaseRepository, https://github.com/wmde/WikibaseClient and https://github.com/wmde/WikibaseLib, but they're marked as "experimental splits" and have no commits since Oct 2014, so I suppose they're dead.
And what about wmde on GitHub with countless creatively-named repos? They make life even harder for potential contributors.
Agreed. Something we want to tackle.
Out of curiosity, was GitHub chosen because it fitted with your workflow? Will you embrace Differential when it comes?
Finally, the ever-changing client-side APIs make gadgets development a pain in the ass.
Agreed but as I said this is going to be painful for a little longer until we have done the UI redesign. After that I want it to be more stable again obviously.
Thanks. Is there a task/page where progress is tracked?
Cheers Lydia
On Tue, Feb 17, 2015 at 1:40 PM, Ricordisamoa ricordisamoa@openmailbox.org wrote:
Upd: I found https://github.com/wmde/WikibaseRepository, https://github.com/wmde/WikibaseClient and https://github.com/wmde/WikibaseLib, but they're marked as "experimental splits" and have no commits since Oct 2014, so I suppose they're dead.
Yeah ignore those for now. They are scratchpads basically.
Cheers Lydia
On Tue, Feb 17, 2015 at 1:33 PM, Ricordisamoa ricordisamoa@openmailbox.org wrote:
Out of curiosity, was GitHub chosen because it fitted with your workflow?
Several reasons but mostly exposure to other users of libraries that are not tied to MediaWiki and workflow.
Will you embrace Differential when it comes?
That's a discussion we need to have when it is available. In general though yes.
Finally, the ever-changing client-side APIs make gadgets development a pain in the ass.
Agreed but as I said this is going to be painful for a little longer until we have done the UI redesign. After that I want it to be more stable again obviously.
Thanks. Is there a task/page where progress is tracked?
https://phabricator.wikimedia.org/T54136 and its subtasks are what I use for tracking the whole redesign. We do this in steps. The sitelinks have been revamped but still need a few more fixes. Header area is getting close. Next up is the statement section. And then polishing on the whole thing.
Cheers Lydia
Il 17/02/2015 13:33, Ricordisamoa ha scritto:
Il 17/02/2015 12:53, Lydia Pintscher ha scritto:
On Tue, Feb 17, 2015 at 12:43 PM, Ricordisamoa ricordisamoa@openmailbox.org wrote:
Hi. I recently started following mediawiki/extensions/Wikibase on Gerrit, and quite astonishingly found that nearly all of the 100 most recently updated changes appear to be owned by WMDE employees (exceptions being one change by Legoktm and some from L10n-bot). This is not the case, for example, with mediawiki/core. While this may be desired by the Wikidata team for corporate reasons, I feel that encouraging code review by volunteers would empower both Wikidata and third-party communities with new ways of contributing to the project and raise awareness of the development team's goals in the long term.
How would you like to see us encourage this more? It is nothing we actively do not want of course.
Using a single code review system and a simpler repository structure will indirectly encourage them. I'm now seeing Wikibase/Programmer's guide to Wikibase https://www.mediawiki.org/wiki/Wikibase/Programmer%27s_guide_to_Wikibase, which seems fairly detailed but partly duplicates the Gerrit help pages.
The messy naming conventions play a role too, i.e. Extension:Wikibase is supposed to host technical documentation but instead redirects to the Wikibase portal, with actual documentation split into Extension:Wikibase Repository and Extension:Wikibase Client, apparently ignoring the fact that the code is actually developed in a single repository (correct me if I'm wrong). Just to add some more confusion, there's also Extension:Wikidata build with no documentation.
There are different repositories. They just get merged into one for deployment.
Really? AFAICS development occurs on mediawiki/extensions/Wikibase https://git.wikimedia.org/summary/?r=mediawiki/extensions/Wikibase.git and on GitHub. mediawiki/extensions/WikibaseRepository https://git.wikimedia.org/summary/?r=mediawiki/extensions/WikibaseRepository.git and mediawiki/extensions/WikibaseClient https://git.wikimedia.org/summary/?r=mediawiki/extensions/WikibaseClient.git also exist but have always been empty.
On Thu, Jul 16, 2015 at 4:54 PM, Ricordisamoa ricordisamoa@openmailbox.org wrote:
That's not going to be done. We're moving more repos to gerrit now. The next ones will be the WikimediaBadges and Wikidata.org extensions. Tracking at https://phabricator.wikimedia.org/T74907
Cheers Lydia
Hey,
As Lydia mentioned, we obviously do not actively discourage outside contributions, and will gladly listen to suggestions on how we can do better. That being said, we are actively taking steps to make it easier for developers not already part of the community to start contributing.
For instance, we created a website about our software itself [0], which lists the MediaWiki extensions and the different libraries [1] we created. For most of our libraries, you can just clone the code and run composer install. And then you're all set. You can make changes, run the tests and submit them back. Different workflow than what you as MediaWiki developer are used to perhaps, though quite a bit simpler. Furthermore, we've been quite progressive in adopting practices and tools from the wider PHP community.
I definitely do not disagree with you that some things could, and should, be improved. Like you I'd like to see the Wikibase git repository and naming of the extensions be aligned more, since it indeed is confusing. Increased API stability, especially the JavaScript one, is something else on my wish-list, amongst a lot of other things. There are always reasons of why things are the way they are now and why they did not improve yet. So I suggest to look at specific pain points and see how things can be improved there. This will get us much further than looking at the general state, concluding people do not want third party contributions, and then protesting against that.
[0] http://wikiba.se/ [1] http://wikiba.se/components/
Cheers
-- Jeroen De Dauw - http://www.bn2vs.com Software craftsmanship advocate Evil software architect at Wikimedia Germany ~=[,,_,,]:3
What bugs me about it is that Wikidata has gone down the same road as Freebase and Neo4J in the sense of developing something ad-hoc that is not well understood.
I understand the motivations that lead there, because there are requirements to meet that standards don't necessarily satisfy, plus Wikidata really is doing ambitious things in the sense of capturing provenance information.
Perhaps it has come a little too late to help with Wikidata but it seems to me that RDF* and SPARQL* have a lot to offer for "data wikis" in that you can view data as plain ordinary RDF and query with SPARQL but you can also attach provenance and other metadata in a sane way with sweet syntax for writing it in Turtle or querying it in other ways.
Another way of thinking about it is that RDF* is formalizing the property graph model which has always been ad hoc in products like Neo4J. I can say that knowing what the algebra is you are implementing helps a lot in getting the tools to work right. So you not only have SPARQL queries as a possibility but also languages like Gremlin and Cypher and this is all pretty exciting. It is also exciting that vendors are getting on board with this and we are going to seeing some stuff that is crazy scalable (way past 10^12 facts on commodity hardware) very soon.
On Tue, Feb 17, 2015 at 12:20 PM, Jeroen De Dauw jeroendedauw@gmail.com wrote:
Hey,
As Lydia mentioned, we obviously do not actively discourage outside contributions, and will gladly listen to suggestions on how we can do better. That being said, we are actively taking steps to make it easier for developers not already part of the community to start contributing.
For instance, we created a website about our software itself [0], which lists the MediaWiki extensions and the different libraries [1] we created. For most of our libraries, you can just clone the code and run composer install. And then you're all set. You can make changes, run the tests and submit them back. Different workflow than what you as MediaWiki developer are used to perhaps, though quite a bit simpler. Furthermore, we've been quite progressive in adopting practices and tools from the wider PHP community.
I definitely do not disagree with you that some things could, and should, be improved. Like you I'd like to see the Wikibase git repository and naming of the extensions be aligned more, since it indeed is confusing. Increased API stability, especially the JavaScript one, is something else on my wish-list, amongst a lot of other things. There are always reasons of why things are the way they are now and why they did not improve yet. So I suggest to look at specific pain points and see how things can be improved there. This will get us much further than looking at the general state, concluding people do not want third party contributions, and then protesting against that.
[0] http://wikiba.se/ [1] http://wikiba.se/components/
Cheers
-- Jeroen De Dauw - http://www.bn2vs.com Software craftsmanship advocate Evil software architect at Wikimedia Germany ~=[,,_,,]:3
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Hi Paul,
Re RDF*/SPARQL*: could you send a link? Someone has really made an effort to find the least googleable terminology here ;-)
Re relying on standards: I think this argument is missing the point. If you look at what developers in Wikidata are concerned with, it is +90% interface and internal data workflow. This would be exaclty the same no matter which data standard you would use. All the challenges of providing a usable UI and a stable API would remain the same, since a data encoding standard does not help with any of this. If you have followed some of the recent discussion on the DBpedia mailing list about the UIs they have there, you can see that Wikidata is already in a very good position in comparison when it comes to exposing data to humans (thanks to Magnus, of course ;-). RDF is great but there are many problems that it does not even try to solve (rightly so). These problems seem to be dominant in the Wikidata world right now.
This said, we are in a great position to adopt new standards as they come along. I agree with you on the obvious relationships between Wikidata statements and the property graph model. We are well aware of this. Graph databases are considered for providing query solutions to Wikidata, and we are considering to set up a SPARQL endpoint for our existing RDF as well. Overall, I don't see a reason why we should not embrace all of these technologies as they suit our purpose, even if they were not available yet when Wikidata was first conceived.
Re "It is also exciting that vendors are getting on board with this and we are going to seeing some stuff that is crazy scalable (way past 10^12 facts on commodity hardware) very soon." [which vendors?] [citation needed] ;-) We would be very interested in learning about such technologies. After the recent end of Titan, the discussion of query answering backends is still ongoing.
Cheers,
Markus
On 18.02.2015 21:25, Paul Houle wrote:
What bugs me about it is that Wikidata has gone down the same road as Freebase and Neo4J in the sense of developing something ad-hoc that is not well understood.
I understand the motivations that lead there, because there are requirements to meet that standards don't necessarily satisfy, plus Wikidata really is doing ambitious things in the sense of capturing provenance information.
Perhaps it has come a little too late to help with Wikidata but it seems to me that RDF* and SPARQL* have a lot to offer for "data wikis" in that you can view data as plain ordinary RDF and query with SPARQL but you can also attach provenance and other metadata in a sane way with sweet syntax for writing it in Turtle or querying it in other ways.
Another way of thinking about it is that RDF* is formalizing the property graph model which has always been ad hoc in products like Neo4J. I can say that knowing what the algebra is you are implementing helps a lot in getting the tools to work right. So you not only have SPARQL queries as a possibility but also languages like Gremlin and Cypher and this is all pretty exciting. It is also exciting that vendors are getting on board with this and we are going to seeing some stuff that is crazy scalable (way past 10^12 facts on commodity hardware) very soon.
On Tue, Feb 17, 2015 at 12:20 PM, Jeroen De Dauw <jeroendedauw@gmail.com mailto:jeroendedauw@gmail.com> wrote:
Hey, As Lydia mentioned, we obviously do not actively discourage outside contributions, and will gladly listen to suggestions on how we can do better. That being said, we are actively taking steps to make it easier for developers not already part of the community to start contributing. For instance, we created a website about our software itself [0], which lists the MediaWiki extensions and the different libraries [1] we created. For most of our libraries, you can just clone the code and run composer install. And then you're all set. You can make changes, run the tests and submit them back. Different workflow than what you as MediaWiki developer are used to perhaps, though quite a bit simpler. Furthermore, we've been quite progressive in adopting practices and tools from the wider PHP community. I definitely do not disagree with you that some things could, and should, be improved. Like you I'd like to see the Wikibase git repository and naming of the extensions be aligned more, since it indeed is confusing. Increased API stability, especially the JavaScript one, is something else on my wish-list, amongst a lot of other things. There are always reasons of why things are the way they are now and why they did not improve yet. So I suggest to look at specific pain points and see how things can be improved there. This will get us much further than looking at the general state, concluding people do not want third party contributions, and then protesting against that. [0] http://wikiba.se/ [1] http://wikiba.se/components/ Cheers -- Jeroen De Dauw - http://www.bn2vs.com Software craftsmanship advocate Evil software architect at Wikimedia Germany ~=[,,_,,]:3 _______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org <mailto:Wikidata-l@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata-l
-- Paul Houle Expert on Freebase, DBpedia, Hadoop and RDF (607) 539 6254 paul.houle on Skype ontology2@gmail.com mailto:ontology2@gmail.com http://legalentityidentifier.info/lei/lookup
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Also, the problem most SPARQL backend developers worried about was not Wikidata's size, but it's dynamicity. Not the number of triples, but the frequency of edits. And we did talk to many of those people.
On Thu, Feb 19, 2015, 07:05 Markus Krötzsch markus@semantic-mediawiki.org wrote:
Hi Paul,
Re RDF*/SPARQL*: could you send a link? Someone has really made an effort to find the least googleable terminology here ;-)
Re relying on standards: I think this argument is missing the point. If you look at what developers in Wikidata are concerned with, it is +90% interface and internal data workflow. This would be exaclty the same no matter which data standard you would use. All the challenges of providing a usable UI and a stable API would remain the same, since a data encoding standard does not help with any of this. If you have followed some of the recent discussion on the DBpedia mailing list about the UIs they have there, you can see that Wikidata is already in a very good position in comparison when it comes to exposing data to humans (thanks to Magnus, of course ;-). RDF is great but there are many problems that it does not even try to solve (rightly so). These problems seem to be dominant in the Wikidata world right now.
This said, we are in a great position to adopt new standards as they come along. I agree with you on the obvious relationships between Wikidata statements and the property graph model. We are well aware of this. Graph databases are considered for providing query solutions to Wikidata, and we are considering to set up a SPARQL endpoint for our existing RDF as well. Overall, I don't see a reason why we should not embrace all of these technologies as they suit our purpose, even if they were not available yet when Wikidata was first conceived.
Re "It is also exciting that vendors are getting on board with this and we are going to seeing some stuff that is crazy scalable (way past 10^12 facts on commodity hardware) very soon." [which vendors?] [citation needed] ;-) We would be very interested in learning about such technologies. After the recent end of Titan, the discussion of query answering backends is still ongoing.
Cheers,
Markus
On 18.02.2015 21:25, Paul Houle wrote:
What bugs me about it is that Wikidata has gone down the same road as Freebase and Neo4J in the sense of developing something ad-hoc that is not well understood.
I understand the motivations that lead there, because there are requirements to meet that standards don't necessarily satisfy, plus Wikidata really is doing ambitious things in the sense of capturing provenance information.
Perhaps it has come a little too late to help with Wikidata but it seems to me that RDF* and SPARQL* have a lot to offer for "data wikis" in that you can view data as plain ordinary RDF and query with SPARQL but you can also attach provenance and other metadata in a sane way with sweet syntax for writing it in Turtle or querying it in other ways.
Another way of thinking about it is that RDF* is formalizing the property graph model which has always been ad hoc in products like Neo4J. I can say that knowing what the algebra is you are implementing helps a lot in getting the tools to work right. So you not only have SPARQL queries as a possibility but also languages like Gremlin and Cypher and this is all pretty exciting. It is also exciting that vendors are getting on board with this and we are going to seeing some stuff that is crazy scalable (way past 10^12 facts on commodity hardware) very soon.
On Tue, Feb 17, 2015 at 12:20 PM, Jeroen De Dauw <jeroendedauw@gmail.com mailto:jeroendedauw@gmail.com> wrote:
Hey, As Lydia mentioned, we obviously do not actively discourage outside contributions, and will gladly listen to suggestions on how we can do better. That being said, we are actively taking steps to make it easier for developers not already part of the community to start contributing. For instance, we created a website about our software itself [0], which lists the MediaWiki extensions and the different libraries [1] we created. For most of our libraries, you can just clone the code and run composer install. And then you're all set. You can make changes, run the tests and submit them back. Different workflow than what you as MediaWiki developer are used to perhaps, though quite a bit simpler. Furthermore, we've been quite progressive in adopting practices and tools from the wider PHP community. I definitely do not disagree with you that some things could, and should, be improved. Like you I'd like to see the Wikibase git repository and naming of the extensions be aligned more, since it indeed is confusing. Increased API stability, especially the JavaScript one, is something else on my wish-list, amongst a lot of other things. There are always reasons of why things are the way they are now and why they did not improve yet. So I suggest to look at specific pain points and see how things can be improved there. This will get us much further than looking at the general state, concluding people do not want third party contributions, and then protesting against that. [0] http://wikiba.se/ [1] http://wikiba.se/components/ Cheers -- Jeroen De Dauw - http://www.bn2vs.com Software craftsmanship advocate Evil software architect at Wikimedia Germany ~=[,,_,,]:3 _______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org <mailto:Wikidata-l@lists.
wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata-l
-- Paul Houle Expert on Freebase, DBpedia, Hadoop and RDF (607) 539 6254 paul.houle on Skype ontology2@gmail.com mailto:ontology2@gmail.com http://legalentityidentifier.info/lei/lookup
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Hoi, I have waited for some time to reply. FIrst of all. Wikidata is not your average data repository. It would not be as relevant as it is if it were not for the fact that it links Wikipedia articles of any language to statements on items.
This is the essence of Wikidata. After that we can all complain about the fallacies of Wikidata.. I have my pet pieves and it is not your RDF SPARQL and stuff. That is mostly stuff for academics and it its use is largely academic and not useful on the level where I want progress. Exposing this information to PEOPLE is what I am after and by and large they do not live in the ivory towers where RDF and SPARQL live.
I am delighted to learn that a production grade replacement of WDQ is being worked on. I am delighted that a front-end (javascript) ? developers is being sought. That is what it takes to bring the sum of al knowledge to all people. It is in enriching the data in Wikidata not in yet another pet project where we can make a difference because that is what the people will see. When SPARQL is available with Wikidata data.. do wonder how you would serve all the readers of Wikipedia.. Does SPARQL sparkle enough when it is challenged in this way ? Thanks, GerardM
On 18 February 2015 at 21:25, Paul Houle ontology2@gmail.com wrote:
What bugs me about it is that Wikidata has gone down the same road as Freebase and Neo4J in the sense of developing something ad-hoc that is not well understood.
I understand the motivations that lead there, because there are requirements to meet that standards don't necessarily satisfy, plus Wikidata really is doing ambitious things in the sense of capturing provenance information.
Perhaps it has come a little too late to help with Wikidata but it seems to me that RDF* and SPARQL* have a lot to offer for "data wikis" in that you can view data as plain ordinary RDF and query with SPARQL but you can also attach provenance and other metadata in a sane way with sweet syntax for writing it in Turtle or querying it in other ways.
Another way of thinking about it is that RDF* is formalizing the property graph model which has always been ad hoc in products like Neo4J. I can say that knowing what the algebra is you are implementing helps a lot in getting the tools to work right. So you not only have SPARQL queries as a possibility but also languages like Gremlin and Cypher and this is all pretty exciting. It is also exciting that vendors are getting on board with this and we are going to seeing some stuff that is crazy scalable (way past 10^12 facts on commodity hardware) very soon.
On Tue, Feb 17, 2015 at 12:20 PM, Jeroen De Dauw jeroendedauw@gmail.com wrote:
Hey,
As Lydia mentioned, we obviously do not actively discourage outside contributions, and will gladly listen to suggestions on how we can do better. That being said, we are actively taking steps to make it easier for developers not already part of the community to start contributing.
For instance, we created a website about our software itself [0], which lists the MediaWiki extensions and the different libraries [1] we created. For most of our libraries, you can just clone the code and run composer install. And then you're all set. You can make changes, run the tests and submit them back. Different workflow than what you as MediaWiki developer are used to perhaps, though quite a bit simpler. Furthermore, we've been quite progressive in adopting practices and tools from the wider PHP community.
I definitely do not disagree with you that some things could, and should, be improved. Like you I'd like to see the Wikibase git repository and naming of the extensions be aligned more, since it indeed is confusing. Increased API stability, especially the JavaScript one, is something else on my wish-list, amongst a lot of other things. There are always reasons of why things are the way they are now and why they did not improve yet. So I suggest to look at specific pain points and see how things can be improved there. This will get us much further than looking at the general state, concluding people do not want third party contributions, and then protesting against that.
[0] http://wikiba.se/ [1] http://wikiba.se/components/
Cheers
-- Jeroen De Dauw - http://www.bn2vs.com Software craftsmanship advocate Evil software architect at Wikimedia Germany ~=[,,_,,]:3
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
-- Paul Houle Expert on Freebase, DBpedia, Hadoop and RDF (607) 539 6254 paul.houle on Skype ontology2@gmail.com http://legalentityidentifier.info/lei/lookup
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Dear Gerard:
...
This is the essence of Wikidata. After that we can all complain about the fallacies of Wikidata.. I have my pet pieves and it is not your RDF SPARQL and stuff. That is mostly stuff for academics and it its use is largely academic and not useful on the level where I want progress. Exposing this information to PEOPLE is what I am after and by and large they do not live in the ivory towers where RDF and SPARQL live.
It seems that, in this bright future, you are forgetting our past. Those very "ivory towers" that you so scold are where Wikidata has been conceived. RDF is the reason why we have properties as first-class objects in Wikidata. Even before such technical "details", the vision of a Semantic Web that enables the free exchange of information beyond system boundaries for Denny and myself has been the first and foremost inspiration for much of the work that went into preparing and realizing Wikidata. Surely not all of Wikidata came from this one source of inspiration -- e.g., the crucial insight that all of this should be in a single multilingual site is due to Erik Moeller -- but without all of the work in semantic technologies we would not have Wikidata today.
People from different backgrounds are working together here. If you want to be part of such a community, you should abandon outdated stereotypes, and in particular stop using "academic" as a pejorative.
Markus
Hoi, Obviously you forgot about OmegaWiki. It can still do things Wikidata is incapable of. Thanks, GerardM
On 20 February 2015 at 17:21, Markus Kroetzsch < markus.kroetzsch@tu-dresden.de> wrote:
Dear Gerard:
...
This is the essence of Wikidata. After that we can all complain about the fallacies of Wikidata.. I have my pet pieves and it is not your RDF SPARQL and stuff. That is mostly stuff for academics and it its use is largely academic and not useful on the level where I want progress. Exposing this information to PEOPLE is what I am after and by and large they do not live in the ivory towers where RDF and SPARQL live.
It seems that, in this bright future, you are forgetting our past. Those very "ivory towers" that you so scold are where Wikidata has been conceived. RDF is the reason why we have properties as first-class objects in Wikidata. Even before such technical "details", the vision of a Semantic Web that enables the free exchange of information beyond system boundaries for Denny and myself has been the first and foremost inspiration for much of the work that went into preparing and realizing Wikidata. Surely not all of Wikidata came from this one source of inspiration -- e.g., the crucial insight that all of this should be in a single multilingual site is due to Erik Moeller -- but without all of the work in semantic technologies we would not have Wikidata today.
People from different backgrounds are working together here. If you want to be part of such a community, you should abandon outdated stereotypes, and in particular stop using "academic" as a pejorative.
Markus
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
On 20.02.2015 17:58, Gerard Meijssen wrote:
Hoi, Obviously you forgot about OmegaWiki. It can still do things Wikidata is incapable of.
I will never forget OmegaWiki. It has a firm place in the history of Wikidata. Experiences with OmegaWiki have directly influenced Wikidata through our conversations with Erik. At the very least, OmegaWiki was the first project using the name "Wikidata" for a Wikimedia-related project/software (this part of the history is somewhat hard to discover right now since it was mainly discussed in late 2004-2006; the historic page is at https://meta.wikimedia.org/wiki/Wikidata/Archive/Wikidata/historical).
Anyway, our discussion was about the role of RDF, not about a comprehensive history of Wikidata. Nevertheless, be assured that if anybody would contest the contributions of OmegaWiki, I will react in a similar fashion.
Markus
Gerard,
I should probably keep my mouth shut about this but I am so offended but what you say that I am not.
I am not an academic. The people behind Wikidata are.
I am a professional programmer who has spent a lot of time being the guy who finishes what other people started; I typically come on when a project has been two years late for two years and I do whatever it takes to get the product in front of the customer.
I know that building a system in PHP to do what Wikidata is the road to hell not because I hate PHP but because I have done it myself and learned it through experience.
I first heard about Wikidata at SemTech in San Francisco and I was told very directly that they were not interested in working with anybody who was experienced with putting data from generic database in front of users because they had worked so hard to get academic positions and get a grant from the Allen Institute and it is more cost-effective and more compatible with academic advancement to hire a bunch of young people who don't know anything but will follow orders.
If you hire 3x the people you need and have good management you can make that work; just the fact that the project has a heavyweight project manager is a very good sign. I mean that is how the CMM 5 shops in India do it, and perhaps they have done that because actually Wikidata has succeeded quite well from a software engineering perspective.
Now so far as RDF and SPARQL go if you'd seen my history you'd see I am an American in the Winston Churchill sense that I've tried everything except for the right thing and finally settled on it. I really had my conversion when I discovered I could take data from Freebase and put it through something more like a reconstruction than a transformation, convert it to RDF and I could write SPARQL queries that "just worked".
RDF* and SPARQL* do not come from an academic background but from a commercial organization that expects to make money by satisfying people's needs and it is being supported by a number of other commercial organizations. See
http://wiki.bigdata.com/wiki/index.php/Reification_Done_Right
This is something that builds on everything successful about RDF and SPARQL and adds the "missing links" that it takes to implement data wikis. If somebody was starting Wikidata today or if the kind of billionaire who buys sports teams the way I might buy a game console wanted to fund an effort to keep Freebase going, RDF*/SPARQL* is the way to do it.
What happens when you build a half-baked system from scratch and don't what algebra is using is that you run into problems that get progressively worse and you wind up like woman who ate the cat because she ate the rat and so forth. If I had a dime for every time I had to fix up some application where people could not figure out how to make primary keys that are unique or every boss who didn't want me to take my time to understand a race condition and wished I would be like the guy who made the race conditions, just trying random things until it sorta works I would be a billionaire and I would take Freebase over and fix all the things that are wrong with it. (Which are really not that bad, but never happened because Google didn't have any incentive to say, improve the book database.)
Or would I?
A structural problem with open data is that people are NOT paying for it. If you were paying for it, the publishers of the data would have an incentive to serve the PEOPLE who are using it. Wikidata is playing to whims of a few rich people and it could disappear at any time when those people get tired of it or decide they have what they want and don't want to make it any easier for competitors to follow them.
Most conventional forms of academic funding that come from governments have the same problems. I mean, you get your grant, you publish a paper, it doesn't particularly matter that what you did worked or not. There is also the perpetual "project" orientation which is not suitable for things like arXiv.org or DBpedia which are really programs or operations. You can have a very hard time finding $400k a year for something that is high impact (i.e. 50,000 scientists use it every day), while next door there is somebody who got $5 million to make something that goes nowhere (i.e. the postdoc wants to use Hadoop to process the server logs but the number of hits is small enough you could do it by hand.)
In terms of producing a usable product Wikidata has made some very good progress in terms of having data that is clean (if not copious), but in terms of having a usable query facility or dump files that are easy to work with, it is still behind DBpedia. I mean, you can do a lot with the DBpedia files with grep and awk and tools like that and it is not that hard to load it into a triple store and you have SPARQL 1.1 which is eminently practical because you can use your whole bag of tricks that you use with relational databases.
In the big picture though, quality is in the eye of the end user, so it only happens if you have a closed feedback loop where the end user has a major influence on the behavior of the producer and certainly the possibility of making more money if you do a better job and (even more so) going out of business if you fail to do so is a powerful way to do it.
The trouble is that most people interested in open data seem to think their time is worth nothing and other people's time is worth nothing and aren't interested in paying even a small amount for services so the producers throw stuff that almost works over the wall. I don't think it would be all that difficult for me to do for Wikidata what I did for Freebase but I am not doing it because you aren't going to pay for it.
[... GOES BACK TO WORK ON A "SKUNK WORKS" PROJECT THAT JUST MIGHT PAY OFF]
On Fri, Feb 20, 2015 at 8:09 AM, Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi, I have waited for some time to reply. FIrst of all. Wikidata is not your average data repository. It would not be as relevant as it is if it were not for the fact that it links Wikipedia articles of any language to statements on items.
This is the essence of Wikidata. After that we can all complain about the fallacies of Wikidata.. I have my pet pieves and it is not your RDF SPARQL and stuff. That is mostly stuff for academics and it its use is largely academic and not useful on the level where I want progress. Exposing this information to PEOPLE is what I am after and by and large they do not live in the ivory towers where RDF and SPARQL live.
I am delighted to learn that a production grade replacement of WDQ is being worked on. I am delighted that a front-end (javascript) ? developers is being sought. That is what it takes to bring the sum of al knowledge to all people. It is in enriching the data in Wikidata not in yet another pet project where we can make a difference because that is what the people will see. When SPARQL is available with Wikidata data.. do wonder how you would serve all the readers of Wikipedia.. Does SPARQL sparkle enough when it is challenged in this way ? Thanks, GerardM
On 18 February 2015 at 21:25, Paul Houle ontology2@gmail.com wrote:
What bugs me about it is that Wikidata has gone down the same road as Freebase and Neo4J in the sense of developing something ad-hoc that is not well understood.
I understand the motivations that lead there, because there are requirements to meet that standards don't necessarily satisfy, plus Wikidata really is doing ambitious things in the sense of capturing provenance information.
Perhaps it has come a little too late to help with Wikidata but it seems to me that RDF* and SPARQL* have a lot to offer for "data wikis" in that you can view data as plain ordinary RDF and query with SPARQL but you can also attach provenance and other metadata in a sane way with sweet syntax for writing it in Turtle or querying it in other ways.
Another way of thinking about it is that RDF* is formalizing the property graph model which has always been ad hoc in products like Neo4J. I can say that knowing what the algebra is you are implementing helps a lot in getting the tools to work right. So you not only have SPARQL queries as a possibility but also languages like Gremlin and Cypher and this is all pretty exciting. It is also exciting that vendors are getting on board with this and we are going to seeing some stuff that is crazy scalable (way past 10^12 facts on commodity hardware) very soon.
On Tue, Feb 17, 2015 at 12:20 PM, Jeroen De Dauw jeroendedauw@gmail.com wrote:
Hey,
As Lydia mentioned, we obviously do not actively discourage outside contributions, and will gladly listen to suggestions on how we can do better. That being said, we are actively taking steps to make it easier for developers not already part of the community to start contributing.
For instance, we created a website about our software itself [0], which lists the MediaWiki extensions and the different libraries [1] we created. For most of our libraries, you can just clone the code and run composer install. And then you're all set. You can make changes, run the tests and submit them back. Different workflow than what you as MediaWiki developer are used to perhaps, though quite a bit simpler. Furthermore, we've been quite progressive in adopting practices and tools from the wider PHP community.
I definitely do not disagree with you that some things could, and should, be improved. Like you I'd like to see the Wikibase git repository and naming of the extensions be aligned more, since it indeed is confusing. Increased API stability, especially the JavaScript one, is something else on my wish-list, amongst a lot of other things. There are always reasons of why things are the way they are now and why they did not improve yet. So I suggest to look at specific pain points and see how things can be improved there. This will get us much further than looking at the general state, concluding people do not want third party contributions, and then protesting against that.
[0] http://wikiba.se/ [1] http://wikiba.se/components/
Cheers
-- Jeroen De Dauw - http://www.bn2vs.com Software craftsmanship advocate Evil software architect at Wikimedia Germany ~=[,,_,,]:3
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
-- Paul Houle Expert on Freebase, DBpedia, Hadoop and RDF (607) 539 6254 paul.houle on Skype ontology2@gmail.com http://legalentityidentifier.info/lei/lookup
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
<big grin> Paul,
My background in computing is in mini mainframes. I know about huge databases. I had my own organisation and it was involved in what started as "Ultimate Wiktionary", it became "OmegaWiki" and I am proud of it.
I understand your frustration. When I look at Wikidata and how it is presented... I use Reasonator when I look at Wikidata's items. I use WDQ when I query the data. Wikidata is so useless without these additional tools. Magnus who wrote those tools said about RDF I can do RDF on top of WDQ....
The proint of WDQ is that it scales, it does load balancing. Not so much for performance reasons but because it occasionally crashes.
RDF and SPARQL may be great but for me the point is very much in providing the same data in multiple languages and THAT is something we can do with Reasonator really well.
Having read your rant, I am sorry. However, I am not sorry to say that Wikidata is very much a tool that is not used, is hardly usable. The current crop of RDF tools are not linked to Wikidata, there is no way that you see effects of data entry affect the results like in WDQ.
I do however believe in Wikidata.. currently I have over 2 million edits to my credit. YES. Wikidata is underfunded and underresourced. Thanks, GerardM
PS I am proud of this in Wikidata... http://tools.wmflabs.org/autolist/autolist1.html?q=CLAIM%5B31%3A4167836%5D%2...
On 20 February 2015 at 19:14, Paul Houle ontology2@gmail.com wrote:
Gerard,
I should probably keep my mouth shut about this but I am so offended
but what you say that I am not.
I am not an academic. The people behind Wikidata are.
I am a professional programmer who has spent a lot of time being the guy who finishes what other people started; I typically come on when a project has been two years late for two years and I do whatever it takes to get the product in front of the customer.
I know that building a system in PHP to do what Wikidata is the road to hell not because I hate PHP but because I have done it myself and learned it through experience.
I first heard about Wikidata at SemTech in San Francisco and I was told very directly that they were not interested in working with anybody who was experienced with putting data from generic database in front of users because they had worked so hard to get academic positions and get a grant from the Allen Institute and it is more cost-effective and more compatible with academic advancement to hire a bunch of young people who don't know anything but will follow orders.
If you hire 3x the people you need and have good management you can make that work; just the fact that the project has a heavyweight project manager is a very good sign. I mean that is how the CMM 5 shops in India do it, and perhaps they have done that because actually Wikidata has succeeded quite well from a software engineering perspective.
Now so far as RDF and SPARQL go if you'd seen my history you'd see I am an American in the Winston Churchill sense that I've tried everything except for the right thing and finally settled on it. I really had my conversion when I discovered I could take data from Freebase and put it through something more like a reconstruction than a transformation, convert it to RDF and I could write SPARQL queries that "just worked".
RDF* and SPARQL* do not come from an academic background but from a commercial organization that expects to make money by satisfying people's needs and it is being supported by a number of other commercial organizations. See
http://wiki.bigdata.com/wiki/index.php/Reification_Done_Right
This is something that builds on everything successful about RDF and SPARQL and adds the "missing links" that it takes to implement data wikis. If somebody was starting Wikidata today or if the kind of billionaire who buys sports teams the way I might buy a game console wanted to fund an effort to keep Freebase going, RDF*/SPARQL* is the way to do it.
What happens when you build a half-baked system from scratch and don't what algebra is using is that you run into problems that get progressively worse and you wind up like woman who ate the cat because she ate the rat and so forth. If I had a dime for every time I had to fix up some application where people could not figure out how to make primary keys that are unique or every boss who didn't want me to take my time to understand a race condition and wished I would be like the guy who made the race conditions, just trying random things until it sorta works I would be a billionaire and I would take Freebase over and fix all the things that are wrong with it. (Which are really not that bad, but never happened because Google didn't have any incentive to say, improve the book database.)
Or would I?
A structural problem with open data is that people are NOT paying for it. If you were paying for it, the publishers of the data would have an incentive to serve the PEOPLE who are using it. Wikidata is playing to whims of a few rich people and it could disappear at any time when those people get tired of it or decide they have what they want and don't want to make it any easier for competitors to follow them.
Most conventional forms of academic funding that come from governments have the same problems. I mean, you get your grant, you publish a paper, it doesn't particularly matter that what you did worked or not. There is also the perpetual "project" orientation which is not suitable for things like arXiv.org or DBpedia which are really programs or operations. You can have a very hard time finding $400k a year for something that is high impact (i.e. 50,000 scientists use it every day), while next door there is somebody who got $5 million to make something that goes nowhere (i.e. the postdoc wants to use Hadoop to process the server logs but the number of hits is small enough you could do it by hand.)
In terms of producing a usable product Wikidata has made some very good progress in terms of having data that is clean (if not copious), but in terms of having a usable query facility or dump files that are easy to work with, it is still behind DBpedia. I mean, you can do a lot with the DBpedia files with grep and awk and tools like that and it is not that hard to load it into a triple store and you have SPARQL 1.1 which is eminently practical because you can use your whole bag of tricks that you use with relational databases.
In the big picture though, quality is in the eye of the end user, so it only happens if you have a closed feedback loop where the end user has a major influence on the behavior of the producer and certainly the possibility of making more money if you do a better job and (even more so) going out of business if you fail to do so is a powerful way to do it.
The trouble is that most people interested in open data seem to think their time is worth nothing and other people's time is worth nothing and aren't interested in paying even a small amount for services so the producers throw stuff that almost works over the wall. I don't think it would be all that difficult for me to do for Wikidata what I did for Freebase but I am not doing it because you aren't going to pay for it.
[... GOES BACK TO WORK ON A "SKUNK WORKS" PROJECT THAT JUST MIGHT PAY OFF]
On Fri, Feb 20, 2015 at 8:09 AM, Gerard Meijssen < gerard.meijssen@gmail.com> wrote:
Hoi, I have waited for some time to reply. FIrst of all. Wikidata is not your average data repository. It would not be as relevant as it is if it were not for the fact that it links Wikipedia articles of any language to statements on items.
This is the essence of Wikidata. After that we can all complain about the fallacies of Wikidata.. I have my pet pieves and it is not your RDF SPARQL and stuff. That is mostly stuff for academics and it its use is largely academic and not useful on the level where I want progress. Exposing this information to PEOPLE is what I am after and by and large they do not live in the ivory towers where RDF and SPARQL live.
I am delighted to learn that a production grade replacement of WDQ is being worked on. I am delighted that a front-end (javascript) ? developers is being sought. That is what it takes to bring the sum of al knowledge to all people. It is in enriching the data in Wikidata not in yet another pet project where we can make a difference because that is what the people will see. When SPARQL is available with Wikidata data.. do wonder how you would serve all the readers of Wikipedia.. Does SPARQL sparkle enough when it is challenged in this way ? Thanks, GerardM
On 18 February 2015 at 21:25, Paul Houle ontology2@gmail.com wrote:
What bugs me about it is that Wikidata has gone down the same road as Freebase and Neo4J in the sense of developing something ad-hoc that is not well understood.
I understand the motivations that lead there, because there are requirements to meet that standards don't necessarily satisfy, plus Wikidata really is doing ambitious things in the sense of capturing provenance information.
Perhaps it has come a little too late to help with Wikidata but it seems to me that RDF* and SPARQL* have a lot to offer for "data wikis" in that you can view data as plain ordinary RDF and query with SPARQL but you can also attach provenance and other metadata in a sane way with sweet syntax for writing it in Turtle or querying it in other ways.
Another way of thinking about it is that RDF* is formalizing the property graph model which has always been ad hoc in products like Neo4J. I can say that knowing what the algebra is you are implementing helps a lot in getting the tools to work right. So you not only have SPARQL queries as a possibility but also languages like Gremlin and Cypher and this is all pretty exciting. It is also exciting that vendors are getting on board with this and we are going to seeing some stuff that is crazy scalable (way past 10^12 facts on commodity hardware) very soon.
On Tue, Feb 17, 2015 at 12:20 PM, Jeroen De Dauw <jeroendedauw@gmail.com
wrote:
Hey,
As Lydia mentioned, we obviously do not actively discourage outside contributions, and will gladly listen to suggestions on how we can do better. That being said, we are actively taking steps to make it easier for developers not already part of the community to start contributing.
For instance, we created a website about our software itself [0], which lists the MediaWiki extensions and the different libraries [1] we created. For most of our libraries, you can just clone the code and run composer install. And then you're all set. You can make changes, run the tests and submit them back. Different workflow than what you as MediaWiki developer are used to perhaps, though quite a bit simpler. Furthermore, we've been quite progressive in adopting practices and tools from the wider PHP community.
I definitely do not disagree with you that some things could, and should, be improved. Like you I'd like to see the Wikibase git repository and naming of the extensions be aligned more, since it indeed is confusing. Increased API stability, especially the JavaScript one, is something else on my wish-list, amongst a lot of other things. There are always reasons of why things are the way they are now and why they did not improve yet. So I suggest to look at specific pain points and see how things can be improved there. This will get us much further than looking at the general state, concluding people do not want third party contributions, and then protesting against that.
[0] http://wikiba.se/ [1] http://wikiba.se/components/
Cheers
-- Jeroen De Dauw - http://www.bn2vs.com Software craftsmanship advocate Evil software architect at Wikimedia Germany ~=[,,_,,]:3
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
-- Paul Houle Expert on Freebase, DBpedia, Hadoop and RDF (607) 539 6254 paul.houle on Skype ontology2@gmail.com http://legalentityidentifier.info/lei/lookup
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
-- Paul Houle Expert on Freebase, DBpedia, Hadoop and RDF (607) 539 6254 paul.houle on Skype ontology2@gmail.com http://legalentityidentifier.info/lei/lookup
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Hi Paul!
I understand your frustration, but let me put a few things into perspective.
For reference: I'm employed by WMDE and work on wikibase/wikidata. I have been working on MediaWiki since 2005, and am being payed for it since 2008.
Am 20.02.2015 um 19:14 schrieb Paul Houle:
I am not an academic. The people behind Wikidata are.
To the extend that most of us have some college degree. The only "full" academic involved is Markus Krötzsch, who together with Denny Vrandecic developed many of the concepts behind Wikidata. He acts as an advisor to the Wikidata project, but doesn't have any formal position.
Oh, we also have a group of students working on their bachelor project with us.
I first heard about Wikidata at SemTech in San Francisco and I was told very directly that they were not interested in working with anybody who was experienced with putting data from generic database in front of users because they had worked so hard to get academic positions and get a grant from the Allen Institute and it is more cost-effective and more compatible with academic advancement to hire a bunch of young people who don't know anything but will follow orders.
Auch. Working with such people would be a drag. Luckily, we have an awesome team of full blooded programmers. Not that we get everything right, or done in time...
RDF* and SPARQL* do not come from an academic background but from a commercial organization that expects to make money by satisfying people's needs and it is being supported by a number of other commercial organizations. See
You'll be happy to hear that we are working with high priority to finally provide full query functionality. We are still evaluating options (WMF's Nik and Stas have been visiting the WMDE office for this, just this week - have a safe trip home, guys!), but the current favorite is, in fact, BlazeGraph, formerly BigData, by the people who came up with RDF* and RDR. If we end up using that, chances are good that we will be exposing a SPARQL endpoint directly.
We may still find a deal breaker though, so no promise. Another option would be Neo4J, using a graph oriented mapping. We could still expose SPARQL (building upon Gremlin, IIRC), but I suspect that we'd probably rather expose something more domain specific, perhaps based on Magnus' WDQ syntax, that operates directly on the graph.
This is something that builds on everything successful about RDF and SPARQL and adds the "missing links" that it takes to implement data wikis. If somebody was starting Wikidata today or if the kind of billionaire who buys sports teams the way I might buy a game console wanted to fund an effort to keep Freebase going, RDF*/SPARQL* is the way to do it.
I still stand by the decision not to use a triple store as the primary storage for wikidata, for various reasons (MediaWiki integration, especially versioning, being among the most important ones).
But I'm all for mapping our internal model to RDF, and exposing a SPARQL endpoint, if we can do that in a reliable manner with the available resources. I'd rather have limited query functionality with five nines uptime than a SPARQL endpoint that is down half the time.
Speaking of mapping to RDF: Have you read http://korrekt.org/papers/Wikidata-RDF-export-2014.pdf?
Wikidata is playing to whims of a few rich people and it could disappear at any time when those people get tired of it or decide they have what they want and don't want to make it any easier for competitors to follow them.
Wikidata development and hosting is funded by donations to Wikimedia, like all Wikimedia projects. The first year of development was indeed funded by large companies and trusts (AI2, Google, and the Moore Foundation), but to my knowledge they never tried to influence our decisions.
We have never had academic funding. I don't think we are going to say "no" if we can get any, though.
The trouble is that most people interested in open data seem to think their time is worth nothing and other people's time is worth nothing and aren't interested in paying even a small amount for services so the producers throw stuff that almost works over the wall. I don't think it would be all that difficult for me to do for Wikidata what I did for Freebase but I am not doing it because you aren't going to pay for it.
If you mail me an application/offer, I'm happy to forward and, depending on content, champion it. Wikimedia doesn't pay as well as big tech companies (Wikimedia operates on a shoe string budget, compared to other sites with upwards of 100k hits per second), but the pay isn't shoddy either. Come and visit! Let's talk!
Regarding Paul's comment:
I first heard about Wikidata at SemTech in San Francisco and I was told very directly that they were not interested in working with anybody who was experienced with putting data from generic database in front of users because they had worked so hard to get academic positions and get a grant from the Allen Institute and it is more cost-effective and more compatible with academic advancement to hire a bunch of young people who don't know anything but will follow orders.
I am, frankly, baffled by this story. It very likely was me, presenting Wikidata at SemTech in SF, so it probably was me you have been talking with, but I have no recollection of a conversation going the way you describe it.
If I remember the timing correctly, I didn't have an academic position at the time of SemTech. Actually, I gave up my academic position to move to Berlin and work on Wikidata.
The donors on Wikidata never exercised any influence on the projects, beyond requiring reports on the progress.
I cannot imagine that I would ever have said that we "were not interested in working with anybody who was experienced with putting data from generic database in front of users", because, really, that would make no sense to say. I also do not remember having gotten an application from you.
Regarding the team that we wanted and eventually did hire, I would sternly disagree with the description of "a bunch of young people who don't know anything but will follow orders" - from the applications we got we choose the most suitable team we could pull together. And considering the discussions we had in the following months, following orders was neither their strength nor the qualification they were chosen for. Nor did they consist only of young people. Instead, it turned out, they were exactly the kind of independent thinkers with dedication to the goal and quality that we were aiming for. Fortunately, for the project.
Maybe the conversation went differently than you are remembering it. E.g. I would have insisted on building Wikidata on top of MediaWiki (for operational reasons). E.g. I would have insisted on everyone to work on Wikidata to move to Berlin (because I thought it would be the only possibility to get the project to an acceptable state in the original timeframe, so that we can ensure its future sustainability). E.g. I would have disagreed on being able to use RDF/SPARQL backends back then out of the box to be Wikidata's backend (but I would have been open for anyone showing me that I was wrong, and indeed very happy because, seriously, I have an unreasonable fondness for SPARQL and RDF). E.g. I would have disagreed that our job as Wikimedia is to spend too many resource in pretty frontends (because that is something the community can do, and as we see, is doing very well - I think Wikimedia should really concentrate on those pieces of work that cannot and are not being done by the community). E.g. I would have insisted on not outsourcing any major part of the development effort to an external service provider. E.g. it could be that we already had all positions filled, and simply no money for more people (really depends on the timing). So there are plenty of points we might have disagreed with, and which, maybe misunderstood, maybe subtly altered by the passage of time in a fallible memory, have lead to the recollection of our conversation that you presented, but, for the reasons mentioned above, I think that your recollection is incorrect.
On Fri Feb 20 2015 at 12:42:44 PM Daniel Kinzler < daniel.kinzler@wikimedia.de> wrote:
Hi Paul!
I understand your frustration, but let me put a few things into perspective.
For reference: I'm employed by WMDE and work on wikibase/wikidata. I have been working on MediaWiki since 2005, and am being payed for it since 2008.
Am 20.02.2015 um 19:14 schrieb Paul Houle:
I am not an academic. The people behind Wikidata are.
To the extend that most of us have some college degree. The only "full" academic involved is Markus Krötzsch, who together with Denny Vrandecic developed many of the concepts behind Wikidata. He acts as an advisor to the Wikidata project, but doesn't have any formal position.
Oh, we also have a group of students working on their bachelor project with us.
I first heard about Wikidata at SemTech in San Francisco and I was told
very
directly that they were not interested in working with anybody who was experienced with putting data from generic database in front of users
because
they had worked so hard to get academic positions and get a grant from
the Allen
Institute and it is more cost-effective and more compatible with academic advancement to hire a bunch of young people who don't know anything but
will
follow orders.
Auch. Working with such people would be a drag. Luckily, we have an awesome team of full blooded programmers. Not that we get everything right, or done in time...
RDF* and SPARQL* do not come from an academic background but from a
commercial
organization that expects to make money by satisfying people's needs and
it is
being supported by a number of other commercial organizations. See
You'll be happy to hear that we are working with high priority to finally provide full query functionality. We are still evaluating options (WMF's Nik and Stas have been visiting the WMDE office for this, just this week - have a safe trip home, guys!), but the current favorite is, in fact, BlazeGraph, formerly BigData, by the people who came up with RDF* and RDR. If we end up using that, chances are good that we will be exposing a SPARQL endpoint directly.
We may still find a deal breaker though, so no promise. Another option would be Neo4J, using a graph oriented mapping. We could still expose SPARQL (building upon Gremlin, IIRC), but I suspect that we'd probably rather expose something more domain specific, perhaps based on Magnus' WDQ syntax, that operates directly on the graph.
This is something that builds on everything successful about RDF and
SPARQL and
adds the "missing links" that it takes to implement data wikis. If
somebody was
starting Wikidata today or if the kind of billionaire who buys sports
teams the
way I might buy a game console wanted to fund an effort to keep Freebase
going,
RDF*/SPARQL* is the way to do it.
I still stand by the decision not to use a triple store as the primary storage for wikidata, for various reasons (MediaWiki integration, especially versioning, being among the most important ones).
But I'm all for mapping our internal model to RDF, and exposing a SPARQL endpoint, if we can do that in a reliable manner with the available resources. I'd rather have limited query functionality with five nines uptime than a SPARQL endpoint that is down half the time.
Speaking of mapping to RDF: Have you read http://korrekt.org/papers/Wikidata-RDF-export-2014.pdf?
Wikidata is playing to whims of a few rich people and it could disappear at any time when those people get tired of
it or
decide they have what they want and don't want to make it any easier for competitors to follow them.
Wikidata development and hosting is funded by donations to Wikimedia, like all Wikimedia projects. The first year of development was indeed funded by large companies and trusts (AI2, Google, and the Moore Foundation), but to my knowledge they never tried to influence our decisions.
We have never had academic funding. I don't think we are going to say "no" if we can get any, though.
The trouble is that most people interested in open data seem to think
their time
is worth nothing and other people's time is worth nothing and aren't
interested
in paying even a small amount for services so the producers throw stuff
that
almost works over the wall. I don't think it would be all that
difficult for me
to do for Wikidata what I did for Freebase but I am not doing it because
you
aren't going to pay for it.
If you mail me an application/offer, I'm happy to forward and, depending on content, champion it. Wikimedia doesn't pay as well as big tech companies (Wikimedia operates on a shoe string budget, compared to other sites with upwards of 100k hits per second), but the pay isn't shoddy either. Come and visit! Let's talk!
-- Daniel Kinzler Senior Software Developer
Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Also, Gerard - you are one to quickly chide others for not being constructive in their criticism, and I very much appreciate you doing so.
I would like to ask you to reconsider whether your contribution to this thread meets your own threshold for being constructive.
Can we please stop being hurtful and dismissive of each other? We have a great project, riding an amazing wave, and there's too much for each one of us to do to afford to hurt each other and make this a place less nice than it could be.
On Fri Feb 20 2015 at 1:44:53 PM Denny Vrandečić vrandecic@google.com wrote:
Regarding Paul's comment:
I first heard about Wikidata at SemTech in San Francisco and I was told very directly that they were not interested in working with anybody who was experienced with putting data from generic database in front of users because they had worked so hard to get academic positions and get a grant from the Allen Institute and it is more cost-effective and more compatible with academic advancement to hire a bunch of young people who don't know anything but will follow orders.
I am, frankly, baffled by this story. It very likely was me, presenting Wikidata at SemTech in SF, so it probably was me you have been talking with, but I have no recollection of a conversation going the way you describe it.
If I remember the timing correctly, I didn't have an academic position at the time of SemTech. Actually, I gave up my academic position to move to Berlin and work on Wikidata.
The donors on Wikidata never exercised any influence on the projects, beyond requiring reports on the progress.
I cannot imagine that I would ever have said that we "were not interested in working with anybody who was experienced with putting data from generic database in front of users", because, really, that would make no sense to say. I also do not remember having gotten an application from you.
Regarding the team that we wanted and eventually did hire, I would sternly disagree with the description of "a bunch of young people who don't know anything but will follow orders" - from the applications we got we choose the most suitable team we could pull together. And considering the discussions we had in the following months, following orders was neither their strength nor the qualification they were chosen for. Nor did they consist only of young people. Instead, it turned out, they were exactly the kind of independent thinkers with dedication to the goal and quality that we were aiming for. Fortunately, for the project.
Maybe the conversation went differently than you are remembering it. E.g. I would have insisted on building Wikidata on top of MediaWiki (for operational reasons). E.g. I would have insisted on everyone to work on Wikidata to move to Berlin (because I thought it would be the only possibility to get the project to an acceptable state in the original timeframe, so that we can ensure its future sustainability). E.g. I would have disagreed on being able to use RDF/SPARQL backends back then out of the box to be Wikidata's backend (but I would have been open for anyone showing me that I was wrong, and indeed very happy because, seriously, I have an unreasonable fondness for SPARQL and RDF). E.g. I would have disagreed that our job as Wikimedia is to spend too many resource in pretty frontends (because that is something the community can do, and as we see, is doing very well - I think Wikimedia should really concentrate on those pieces of work that cannot and are not being done by the community). E.g. I would have insisted on not outsourcing any major part of the development effort to an external service provider. E.g. it could be that we already had all positions filled, and simply no money for more people (really depends on the timing). So there are plenty of points we might have disagreed with, and which, maybe misunderstood, maybe subtly altered by the passage of time in a fallible memory, have lead to the recollection of our conversation that you presented, but, for the reasons mentioned above, I think that your recollection is incorrect.
On Fri Feb 20 2015 at 12:42:44 PM Daniel Kinzler < daniel.kinzler@wikimedia.de> wrote:
Hi Paul!
I understand your frustration, but let me put a few things into perspective.
For reference: I'm employed by WMDE and work on wikibase/wikidata. I have been working on MediaWiki since 2005, and am being payed for it since 2008.
Am 20.02.2015 um 19:14 schrieb Paul Houle:
I am not an academic. The people behind Wikidata are.
To the extend that most of us have some college degree. The only "full" academic involved is Markus Krötzsch, who together with Denny Vrandecic developed many of the concepts behind Wikidata. He acts as an advisor to the Wikidata project, but doesn't have any formal position.
Oh, we also have a group of students working on their bachelor project with us.
I first heard about Wikidata at SemTech in San Francisco and I was told
very
directly that they were not interested in working with anybody who was experienced with putting data from generic database in front of users
because
they had worked so hard to get academic positions and get a grant from
the Allen
Institute and it is more cost-effective and more compatible with
academic
advancement to hire a bunch of young people who don't know anything but
will
follow orders.
Auch. Working with such people would be a drag. Luckily, we have an awesome team of full blooded programmers. Not that we get everything right, or done in time...
RDF* and SPARQL* do not come from an academic background but from a
commercial
organization that expects to make money by satisfying people's needs
and it is
being supported by a number of other commercial organizations. See
You'll be happy to hear that we are working with high priority to finally provide full query functionality. We are still evaluating options (WMF's Nik and Stas have been visiting the WMDE office for this, just this week - have a safe trip home, guys!), but the current favorite is, in fact, BlazeGraph, formerly BigData, by the people who came up with RDF* and RDR. If we end up using that, chances are good that we will be exposing a SPARQL endpoint directly.
We may still find a deal breaker though, so no promise. Another option would be Neo4J, using a graph oriented mapping. We could still expose SPARQL (building upon Gremlin, IIRC), but I suspect that we'd probably rather expose something more domain specific, perhaps based on Magnus' WDQ syntax, that operates directly on the graph.
This is something that builds on everything successful about RDF and
SPARQL and
adds the "missing links" that it takes to implement data wikis. If
somebody was
starting Wikidata today or if the kind of billionaire who buys sports
teams the
way I might buy a game console wanted to fund an effort to keep
Freebase going,
RDF*/SPARQL* is the way to do it.
I still stand by the decision not to use a triple store as the primary storage for wikidata, for various reasons (MediaWiki integration, especially versioning, being among the most important ones).
But I'm all for mapping our internal model to RDF, and exposing a SPARQL endpoint, if we can do that in a reliable manner with the available resources. I'd rather have limited query functionality with five nines uptime than a SPARQL endpoint that is down half the time.
Speaking of mapping to RDF: Have you read http://korrekt.org/papers/Wikidata-RDF-export-2014.pdf?
Wikidata is playing to whims of a few rich people and it could disappear at any time when those people get tired
of it or
decide they have what they want and don't want to make it any easier for competitors to follow them.
Wikidata development and hosting is funded by donations to Wikimedia, like all Wikimedia projects. The first year of development was indeed funded by large companies and trusts (AI2, Google, and the Moore Foundation), but to my knowledge they never tried to influence our decisions.
We have never had academic funding. I don't think we are going to say "no" if we can get any, though.
The trouble is that most people interested in open data seem to think
their time
is worth nothing and other people's time is worth nothing and aren't
interested
in paying even a small amount for services so the producers throw stuff
that
almost works over the wall. I don't think it would be all that
difficult for me
to do for Wikidata what I did for Freebase but I am not doing it
because you
aren't going to pay for it.
If you mail me an application/offer, I'm happy to forward and, depending on content, champion it. Wikimedia doesn't pay as well as big tech companies (Wikimedia operates on a shoe string budget, compared to other sites with upwards of 100k hits per second), but the pay isn't shoddy either. Come and visit! Let's talk!
-- Daniel Kinzler Senior Software Developer
Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
On 20.02.2015 22:44, Denny Vrandečić wrote:
Regarding Paul's comment:
I first heard about Wikidata at SemTech in San Francisco and I was told very directly that they were not interested in working with anybody who was experienced with putting data from generic database in front of users because they had worked so hard to get academic positions and get a grant from the Allen Institute and it is more cost-effective and more compatible with academic advancement to hire a bunch of young people who don't know anything but will follow orders.
I am, frankly, baffled by this story.
So am I. I think we all agree that there was never a time where anybody in the Wikidata project could have held such a view. Also, anybody who is following discussions on the various project channels knows that Wikimedia developers (in Wikidata and elsewhere in WMF) are not the "following orders" type but strong individual personalities who are each bringing in a big share of passion. This is the way we work and want to work.
@Paul: You should not feel rejected (for whatever reason) by Wikidata. Nobody wants to contest your expertise here. I understand that you would have made some technical choices differently, but you should not be frustrated because of that. We, too, have had many heated tech discussions about Wikidata, and each of us has had to give up some positions in the process.
Cheers,
Markus
On 02/17/2015 03:43 AM, Ricordisamoa wrote:
Hi. I recently started following mediawiki/extensions/Wikibase on Gerrit, and quite astonishingly found that nearly all of the 100 most recently updated changes appear to be owned by WMDE employees (exceptions being one change by Legoktm and some from L10n-bot). This is not the case, for example, with mediawiki/core.
I used to be more active in Wikidata development but was put off after discovering that WMDE developers can directly push commits without review, and if they need to be reverted I need to spend 20 minutes trying to figure out how to use Github to submit a pull request. And even though I am trusted with +2 on mediawiki/*, that doesn't give me +2 on these repos to revert obviously bad commits.
Sorry if this sounds like a slap in the face, but it had to be said.
Thank you.
-- Legoktm
On Thu, Feb 19, 2015 at 12:03 AM, Legoktm legoktm.wikipedia@gmail.com wrote:
I used to be more active in Wikidata development but was put off after discovering that WMDE developers can directly push commits without review, and if they need to be reverted I need to spend 20 minutes trying to figure out how to use Github to submit a pull request. And even though I am trusted with +2 on mediawiki/*, that doesn't give me +2 on these repos to revert obviously bad commits.
Why didn't you come to me to talk about this? Folks, if something upsets you that much you need to come and talk to me. I can't promise I can always fix it but I will try and if I don't know about it I definitely can't. You can reach me via email, irc, facebook, twitter, face-to-face etc. In this particular case: Noone should push without review. If someone does then I need to know. And you could obviously have gotten the necessary rights on that repo. (And still can if you want that.) But again I need to know to make that happen.
Cheers Lydia