Also, the problem most SPARQL backend developers worried about was not Wikidata's size, but it's dynamicity. Not the number of triples, but the frequency of edits. And we did talk to many of those people.
On Thu, Feb 19, 2015, 07:05 Markus Krötzsch markus@semantic-mediawiki.org wrote:
Hi Paul,
Re RDF*/SPARQL*: could you send a link? Someone has really made an effort to find the least googleable terminology here ;-)
Re relying on standards: I think this argument is missing the point. If you look at what developers in Wikidata are concerned with, it is +90% interface and internal data workflow. This would be exaclty the same no matter which data standard you would use. All the challenges of providing a usable UI and a stable API would remain the same, since a data encoding standard does not help with any of this. If you have followed some of the recent discussion on the DBpedia mailing list about the UIs they have there, you can see that Wikidata is already in a very good position in comparison when it comes to exposing data to humans (thanks to Magnus, of course ;-). RDF is great but there are many problems that it does not even try to solve (rightly so). These problems seem to be dominant in the Wikidata world right now.
This said, we are in a great position to adopt new standards as they come along. I agree with you on the obvious relationships between Wikidata statements and the property graph model. We are well aware of this. Graph databases are considered for providing query solutions to Wikidata, and we are considering to set up a SPARQL endpoint for our existing RDF as well. Overall, I don't see a reason why we should not embrace all of these technologies as they suit our purpose, even if they were not available yet when Wikidata was first conceived.
Re "It is also exciting that vendors are getting on board with this and we are going to seeing some stuff that is crazy scalable (way past 10^12 facts on commodity hardware) very soon." [which vendors?] [citation needed] ;-) We would be very interested in learning about such technologies. After the recent end of Titan, the discussion of query answering backends is still ongoing.
Cheers,
Markus
On 18.02.2015 21:25, Paul Houle wrote:
What bugs me about it is that Wikidata has gone down the same road as Freebase and Neo4J in the sense of developing something ad-hoc that is not well understood.
I understand the motivations that lead there, because there are requirements to meet that standards don't necessarily satisfy, plus Wikidata really is doing ambitious things in the sense of capturing provenance information.
Perhaps it has come a little too late to help with Wikidata but it seems to me that RDF* and SPARQL* have a lot to offer for "data wikis" in that you can view data as plain ordinary RDF and query with SPARQL but you can also attach provenance and other metadata in a sane way with sweet syntax for writing it in Turtle or querying it in other ways.
Another way of thinking about it is that RDF* is formalizing the property graph model which has always been ad hoc in products like Neo4J. I can say that knowing what the algebra is you are implementing helps a lot in getting the tools to work right. So you not only have SPARQL queries as a possibility but also languages like Gremlin and Cypher and this is all pretty exciting. It is also exciting that vendors are getting on board with this and we are going to seeing some stuff that is crazy scalable (way past 10^12 facts on commodity hardware) very soon.
On Tue, Feb 17, 2015 at 12:20 PM, Jeroen De Dauw <jeroendedauw@gmail.com mailto:jeroendedauw@gmail.com> wrote:
Hey, As Lydia mentioned, we obviously do not actively discourage outside contributions, and will gladly listen to suggestions on how we can do better. That being said, we are actively taking steps to make it easier for developers not already part of the community to start contributing. For instance, we created a website about our software itself [0], which lists the MediaWiki extensions and the different libraries [1] we created. For most of our libraries, you can just clone the code and run composer install. And then you're all set. You can make changes, run the tests and submit them back. Different workflow than what you as MediaWiki developer are used to perhaps, though quite a bit simpler. Furthermore, we've been quite progressive in adopting practices and tools from the wider PHP community. I definitely do not disagree with you that some things could, and should, be improved. Like you I'd like to see the Wikibase git repository and naming of the extensions be aligned more, since it indeed is confusing. Increased API stability, especially the JavaScript one, is something else on my wish-list, amongst a lot of other things. There are always reasons of why things are the way they are now and why they did not improve yet. So I suggest to look at specific pain points and see how things can be improved there. This will get us much further than looking at the general state, concluding people do not want third party contributions, and then protesting against that. [0] http://wikiba.se/ [1] http://wikiba.se/components/ Cheers -- Jeroen De Dauw - http://www.bn2vs.com Software craftsmanship advocate Evil software architect at Wikimedia Germany ~=[,,_,,]:3 _______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org <mailto:Wikidata-l@lists.
wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata-l
-- Paul Houle Expert on Freebase, DBpedia, Hadoop and RDF (607) 539 6254 paul.houle on Skype ontology2@gmail.com mailto:ontology2@gmail.com http://legalentityidentifier.info/lei/lookup
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l