Very well put, Stas, thank you!
Am 13.12.2016 um 07:23 schrieb Stas Malyshev:
Hi!
If I wanted to make a page on the English Wikipedia using wikitext called "List of United States presidents" that dynamically embeds information from https://www.wikidata.org/wiki/Q23 and https://www.wikidata.org/wiki/Q11806 and other similar items, is this currently possible? I consider this to be arbitrary Wikidata querying, but if that's not the correct term, please let me know what to call it.
So this is kind of can of worms which we I guess eventually have to open, but very carefully. So I want to state my _current_ opinion on the matters - please note, it can change at any time due to changing circumstances, persuasion, experience, revelation, etc.
- Technically, anything that can access a web-service and speak JSON,
can talk to SPARQL server. So, in theory, making some way to do this, *in theory*, would not be very hard. But - please keep reading.
- I am very apprehensive about having direct link between any wiki
pages and SPARQL server without heavy caching and rate limiting in between. We don't have super-strong setup there and I'm afraid making such link would just knock our setup over, especially if people start putting queries into frequently-used templates.
- We have number of bot setups (Listeria etc.) which can auto-update
lists from SPARQL periodically. This works reasonably well (excepting occasional timeout on tricky queries, etc.) and does not require requesting the info too frequently.
- If we want more direct page-to-SPARQL-to-page interface, we need to
think about storing/caching data, and not for 5 minutes like it's cached now but for much longer time, probably in storage other than varnish. Ideally, that storage would be more of a persistent store than a cache - i.e. it would always (or nearly always) be available but periodically updated. Kind of like bots mentioned above but more generic. I don't have any more design for it beyond that but that's I think the direction we should be looking into.
A more advanced form of this Wikidata querying would be dynamically generating a list of presidents of the United States by finding every Wikidata item where position held includes "President of the United States". Is this currently possible on-wiki or via wikitext?
No, and there are tricky parts there. Consider https://www.wikidata.org/wiki/Q735712. Yes, Lex Luthor held the office of the President of the USA. In a fictional universe, of course. But the naive query - every Wikidata item where position held includes "President of the United States" - would return Lex Luthor as the president as legitimate as Abraham Lincoln. In fact, there are 79 US presidents judging by "position held" alone. So clearly, there need to be some limits. And those limits would be on case-by-case basis.
If either of these querying capabilities are possible, how do I do them? I don't understand how to query Wikidata in a useful way and I find this frustrating. Since 2012, we've been putting a lot of data into Wikidata, but I want to programmatically extract some of this data and use it in my Wikipedia editing. How do I do this?
Right now the best way is use one of the list-maintaining bots I think. Unless you're talking about pulling a small set of values, in which case Lua/templates are probably the best venue.
If these querying capabilities are not currently possible, when might they be? I understand that cache invalidation is difficult and that this will need a sensible editing user interface, but I don't care about all of that, I just want to be able to query data out of this large data store.
We're working on it (mostly thinking right now, but correct design is 80% of the work, so...). Visualizations already have query capabilities (mainly because they have strong caching model embedded and because there are not too many of them and you need to create them so we can watch the load carefully). Other pages can gain them - probably via some kind of Lua functionality - as soon as we figure out what's the right way to do it, hopefully somewhere within the next year (no promise, but hopefully).