Hi Ben, hi Max, hi Wikidata-istas (another one for the list),
thanks for the support. Good question, and an important one too! I did not want to expand the proposal much more, so there is only one little paragraph I added about upcoming Wikidata query functionality ("Wikibase query services" under "Tools, technologies, and techniques"). Here I will expand a bit more on why these things are quite different.
Question: What is the difference between the planned Wikidata query service and the proposed Wikidata Toolkit?
Answer: First, let me assure you that a Wikidata query service is coming. This has been an important project for a while now and a lot of work in that direction is under way. My proposal neither replaces nor prepares or supersedes this; they are just two different things.
The proposed toolkit is meant to support developers who want to work on the data. A basic requirement to do this will be to run some forms of queries, so this should somehow be supported. However, it is not clear yet what form these queries should take. I am definitely interested in features that resemble the "tree" feature in Magnus's Wikidata Query (a kind of transitive closure over properties). This is something that is not currently planned for Wikidata, and indeed it requires a type of recursion that is best implemented in memory (like Magnus does) but hard to get done efficiently if you delegate queries to MySQL (as Wikidata does). I am also interested in constraints and in rules (again, Magnus has some precedence for this in his Reasonator and bot proposals). This, too, will require some form of recursion which is very hard to realise on a relational DB (I tried it; others tried it; it just does not get anywhere near the performance of in-memory systems, even on large data, whether you use MySQL or Oracle).
Running some operation on data recursively is quite natural when you are in a programming environment and your data is in a data structure. I would like the Wikidata Toolkit to support this way of handling data. This is very far from a (complex or easy) query language that captures all possible queries in one general format. Of course there are query languages that support regular expressions on binary relations (so-called path queries), and there are also query languages with recursion (e.g., Datalog), but my goal is not to create a service for one such language. One could do that on top of the Toolkit for more than one query language.
Besides all this querying, there are also some other tasks that are hard to phrase as a query but are rather a kind of computational analysis. You might want to do this in a programming language, or you might just want to use the Toolkit to export a large matrix file that you can feed into R or Matlab. Again, this is not something you would get from the query service, even though you might need queries to get the data you need.
Now compared to this, the Wikidata plans are much more focussed and (for this particular task) much more advanced than my proposal (which proposes to start the work by working out what to do, see Task T1 ;-). The plans for Wikidata are based on a language in the style of SMW's #ask, i.e., a language where you have neither JOINs nor variables explicitly -- instead, JOINS are implicit and "tree-shaped", i.e., there are no cyclic relationships. A simple example of a query that is not tree-shaped is "which people who were born in the same town that they died in?" another is "which people are the child of married parents?". Neither of these can be asked in SMW. There are many other features where query languages can differ. It is clear to me that the requirements should not all be satisfied by a single Wikidata query service -- that would probably lead to a rather bloated and inefficient service, too. Instead, Wikidata will focus on the most important Wikipedia-based use cases first. The Toolkit should be "compatible" with Wikidata's query support (maybe even have a representation for Wikibase query objects), but it should also allow to explore other query types.
I hope this clarifies a bit the differences between Wikibase's upcoming query web service and the Toolkit. Both activities should still benefit one another: the Toolkit will be a good basis for exploring new query features and implementation approaches; the web service will be a convenient way to access live data (even our little Wikidata Analytics script already accesses the Wikidata Web API when it needs data that would take too long to seek inside a huge dump). So, summing up, all will be good. :-)
Cheers,
Markus
On 30/09/13 21:43, Benjamin Good wrote:
Markus,
I already cast in my vote of support for you, but I had the same question. If you could clarify the boundaries between what you are doing and what wikidata is doing directly, that would be very helpful.
-Ben
On Mon, Sep 30, 2013 at 12:32 PM, Klein,Max <kleinm@oclc.org mailto:kleinm@oclc.org> wrote:
Hello Markus, Your draft proposal, seems so obvious when I read it, but I would have never thought about proposing it myself. Despite the fact that my research is cited as an example of a motivating capability, and that it was 2 weeks of needless headache to code, I had always thought that magical "Phase 3" was coming to solve our query woes. I don't see that highlighted in your IEG. In fact from what I know, there are still plans from the official Wikidata team to build advanced query functionality. Although I do remember Denny saying that the team was scrapping the "Phase" development paradigm, so maybe I missed something along the way. Anyway, I think you should address more the fact that this work is ostensibly planned from the main Wikidata grant, and why this extra work - or extra attention - is needed in addition. Ps. Wikidatian, or Wikidatum are my faves so far. Maximilian Klein Wikipedian in Residence, OCLC +17074787023 <tel:%2B17074787023> ________________________________________ From: wikidata-l-bounces@lists.wikimedia.org <mailto:wikidata-l-bounces@lists.wikimedia.org> <wikidata-l-bounces@lists.wikimedia.org <mailto:wikidata-l-bounces@lists.wikimedia.org>> on behalf of Markus Krötzsch <markus@semantic-mediawiki.org <mailto:markus@semantic-mediawiki.org>> Sent: Sunday, September 29, 2013 5:11 AM To: Discussion list for the Wikidata project. Subject: [Wikidata-l] Wikidata Toolkit: call for feedback/support Dear Wikidatanions (*), I have just drafted a little proposal for creating more tools for external people to work with Wikidata, especially to build services on top of its data [1]. Your feedback and support is needed. Idea: Currently, this is quite hard for people, since we only have WDA for reading/analysing dumps [2] and Wikidata Query as a single web service to ask queries [3]. We should have more support for programmers who want to load, query, analyse, and otherwise use the data. The proposal is to start such a toolkit to enable more work with the data. The plan is to kickstart this project with a small team using Wikimedia's Individual Engagement program. For this we will need your support -- feel free to add your voice to the wiki page [1]. Of course, comments of all sorts are also great -- this email thread will be linked from the page. If you would like to be involved with the project, that's great too; let me know and I can add you to the proposal. The proposal will already be submitted tomorrow, but support should also be possible after that, I hope. Cheers, Markus (*) Do we have a demonym yet? Wikipedian sounds natural, Wikidatan less so. Maybe this should be another thread ... ;-) [1] https://meta.wikimedia.org/wiki/Grants:IEG/Wikidata_Toolkit [2] http://github.com/mkroetzsch/wda [3] http://208.80.153.172/wdq/ -- Markus Kroetzsch, Departmental Lecturer Department of Computer Science, University of Oxford Room 306, Parks Road, OX1 3QD Oxford, United Kingdom +44 (0)1865 283529 <tel:%2B44%20%280%291865%20283529> http://korrekt.org/ _______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org <mailto:Wikidata-l@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata-l _______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org <mailto:Wikidata-l@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l