Hi.
If I wanted to make a page on the English Wikipedia using wikitext called "List of United States presidents" that dynamically embeds information from https://www.wikidata.org/wiki/Q23 and https://www.wikidata.org/wiki/Q11806 and other similar items, is this currently possible? I consider this to be arbitrary Wikidata querying, but if that's not the correct term, please let me know what to call it.
A more advanced form of this Wikidata querying would be dynamically generating a list of presidents of the United States by finding every Wikidata item where position held includes "President of the United States". Is this currently possible on-wiki or via wikitext?
If either of these querying capabilities are possible, how do I do them? I don't understand how to query Wikidata in a useful way and I find this frustrating. Since 2012, we've been putting a lot of data into Wikidata, but I want to programmatically extract some of this data and use it in my Wikipedia editing. How do I do this?
If these querying capabilities are not currently possible, when might they be? I understand that cache invalidation is difficult and that this will need a sensible editing user interface, but I don't care about all of that, I just want to be able to query data out of this large data store.
MZMcBride
AFAIK, you can query data from Wikidata, but you cannot put it into a page, unless its a graph. Graphs can do it - https://www.mediawiki.org/wiki/Extension:Graph/Demo/Sparql
As of last Thursday, you can also create a table on Commons Data namespace, and make a simple Lua script on your favorite wiki to pull that data in and render it. Since Wikidata is accessible from Lua, you could pull useful information about each president. I am not sure about the efficiency aspects here.
WeatherDemo https://en.wikipedia.org/wiki/User:Yurik/WeatherDemo -- pulls data from commons Data:Weather/New_York_City.tab https://commons.wikimedia.org/wiki/Data:Weather/New_York_City.tab and formats it using enwiki module https://en.wikipedia.org/wiki/Module:Sandbox/Yurik. I'm still working on some fun demos for a big presentation.
On Sat, Dec 10, 2016 at 8:30 PM MZMcBride z@mzmcbride.com wrote:
Hi.
If I wanted to make a page on the English Wikipedia using wikitext called "List of United States presidents" that dynamically embeds information from https://www.wikidata.org/wiki/Q23 and https://www.wikidata.org/wiki/Q11806 and other similar items, is this currently possible? I consider this to be arbitrary Wikidata querying, but if that's not the correct term, please let me know what to call it.
A more advanced form of this Wikidata querying would be dynamically generating a list of presidents of the United States by finding every Wikidata item where position held includes "President of the United States". Is this currently possible on-wiki or via wikitext?
If either of these querying capabilities are possible, how do I do them? I don't understand how to query Wikidata in a useful way and I find this frustrating. Since 2012, we've been putting a lot of data into Wikidata, but I want to programmatically extract some of this data and use it in my Wikipedia editing. How do I do this?
If these querying capabilities are not currently possible, when might they be? I understand that cache invalidation is difficult and that this will need a sensible editing user interface, but I don't care about all of that, I just want to be able to query data out of this large data store.
MZMcBride
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Sat, Dec 10, 2016 at 5:30 PM, MZMcBride z@mzmcbride.com wrote:
A more advanced form of this Wikidata querying would be dynamically generating a list of presidents of the United States by finding every Wikidata item where position held includes "President of the United States". Is this currently possible on-wiki or via wikitext?
Not directly, but there are bots which can emulate it, such as Listeria by Magnus: http://magnusmanske.de/wordpress/?p=301
Currently it is only possible with Lua. The documentation is in: https://www.mediawiki.org/wiki/Extension:Wikibase_Client/Lua
it is quite ugly to write such module (not cool SPARQL...) but it works, and you can expose it with nice interface to be used in wikipages.
On Sun, Dec 11, 2016 at 6:17 AM, Gergo Tisza gtisza@wikimedia.org wrote:
On Sat, Dec 10, 2016 at 5:30 PM, MZMcBride z@mzmcbride.com wrote:
A more advanced form of this Wikidata querying would be dynamically generating a list of presidents of the United States by finding every Wikidata item where position held includes "President of the United States". Is this currently possible on-wiki or via wikitext?
Not directly, but there are bots which can emulate it, such as Listeria by Magnus: http://magnusmanske.de/wordpress/?p=301 _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Hi!
If I wanted to make a page on the English Wikipedia using wikitext called "List of United States presidents" that dynamically embeds information from https://www.wikidata.org/wiki/Q23 and https://www.wikidata.org/wiki/Q11806 and other similar items, is this currently possible? I consider this to be arbitrary Wikidata querying, but if that's not the correct term, please let me know what to call it.
So this is kind of can of worms which we I guess eventually have to open, but very carefully. So I want to state my _current_ opinion on the matters - please note, it can change at any time due to changing circumstances, persuasion, experience, revelation, etc.
1. Technically, anything that can access a web-service and speak JSON, can talk to SPARQL server. So, in theory, making some way to do this, *in theory*, would not be very hard. But - please keep reading.
2. I am very apprehensive about having direct link between any wiki pages and SPARQL server without heavy caching and rate limiting in between. We don't have super-strong setup there and I'm afraid making such link would just knock our setup over, especially if people start putting queries into frequently-used templates.
3. We have number of bot setups (Listeria etc.) which can auto-update lists from SPARQL periodically. This works reasonably well (excepting occasional timeout on tricky queries, etc.) and does not require requesting the info too frequently.
4. If we want more direct page-to-SPARQL-to-page interface, we need to think about storing/caching data, and not for 5 minutes like it's cached now but for much longer time, probably in storage other than varnish. Ideally, that storage would be more of a persistent store than a cache - i.e. it would always (or nearly always) be available but periodically updated. Kind of like bots mentioned above but more generic. I don't have any more design for it beyond that but that's I think the direction we should be looking into.
A more advanced form of this Wikidata querying would be dynamically generating a list of presidents of the United States by finding every Wikidata item where position held includes "President of the United States". Is this currently possible on-wiki or via wikitext?
No, and there are tricky parts there. Consider https://www.wikidata.org/wiki/Q735712. Yes, Lex Luthor held the office of the President of the USA. In a fictional universe, of course. But the naive query - every Wikidata item where position held includes "President of the United States" - would return Lex Luthor as the president as legitimate as Abraham Lincoln. In fact, there are 79 US presidents judging by "position held" alone. So clearly, there need to be some limits. And those limits would be on case-by-case basis.
If either of these querying capabilities are possible, how do I do them? I don't understand how to query Wikidata in a useful way and I find this frustrating. Since 2012, we've been putting a lot of data into Wikidata, but I want to programmatically extract some of this data and use it in my Wikipedia editing. How do I do this?
Right now the best way is use one of the list-maintaining bots I think. Unless you're talking about pulling a small set of values, in which case Lua/templates are probably the best venue.
If these querying capabilities are not currently possible, when might they be? I understand that cache invalidation is difficult and that this will need a sensible editing user interface, but I don't care about all of that, I just want to be able to query data out of this large data store.
We're working on it (mostly thinking right now, but correct design is 80% of the work, so...). Visualizations already have query capabilities (mainly because they have strong caching model embedded and because there are not too many of them and you need to create them so we can watch the load carefully). Other pages can gain them - probably via some kind of Lua functionality - as soon as we figure out what's the right way to do it, hopefully somewhere within the next year (no promise, but hopefully).
Very well put, Stas, thank you!
Am 13.12.2016 um 07:23 schrieb Stas Malyshev:
Hi!
If I wanted to make a page on the English Wikipedia using wikitext called "List of United States presidents" that dynamically embeds information from https://www.wikidata.org/wiki/Q23 and https://www.wikidata.org/wiki/Q11806 and other similar items, is this currently possible? I consider this to be arbitrary Wikidata querying, but if that's not the correct term, please let me know what to call it.
So this is kind of can of worms which we I guess eventually have to open, but very carefully. So I want to state my _current_ opinion on the matters - please note, it can change at any time due to changing circumstances, persuasion, experience, revelation, etc.
- Technically, anything that can access a web-service and speak JSON,
can talk to SPARQL server. So, in theory, making some way to do this, *in theory*, would not be very hard. But - please keep reading.
- I am very apprehensive about having direct link between any wiki
pages and SPARQL server without heavy caching and rate limiting in between. We don't have super-strong setup there and I'm afraid making such link would just knock our setup over, especially if people start putting queries into frequently-used templates.
- We have number of bot setups (Listeria etc.) which can auto-update
lists from SPARQL periodically. This works reasonably well (excepting occasional timeout on tricky queries, etc.) and does not require requesting the info too frequently.
- If we want more direct page-to-SPARQL-to-page interface, we need to
think about storing/caching data, and not for 5 minutes like it's cached now but for much longer time, probably in storage other than varnish. Ideally, that storage would be more of a persistent store than a cache - i.e. it would always (or nearly always) be available but periodically updated. Kind of like bots mentioned above but more generic. I don't have any more design for it beyond that but that's I think the direction we should be looking into.
A more advanced form of this Wikidata querying would be dynamically generating a list of presidents of the United States by finding every Wikidata item where position held includes "President of the United States". Is this currently possible on-wiki or via wikitext?
No, and there are tricky parts there. Consider https://www.wikidata.org/wiki/Q735712. Yes, Lex Luthor held the office of the President of the USA. In a fictional universe, of course. But the naive query - every Wikidata item where position held includes "President of the United States" - would return Lex Luthor as the president as legitimate as Abraham Lincoln. In fact, there are 79 US presidents judging by "position held" alone. So clearly, there need to be some limits. And those limits would be on case-by-case basis.
If either of these querying capabilities are possible, how do I do them? I don't understand how to query Wikidata in a useful way and I find this frustrating. Since 2012, we've been putting a lot of data into Wikidata, but I want to programmatically extract some of this data and use it in my Wikipedia editing. How do I do this?
Right now the best way is use one of the list-maintaining bots I think. Unless you're talking about pulling a small set of values, in which case Lua/templates are probably the best venue.
If these querying capabilities are not currently possible, when might they be? I understand that cache invalidation is difficult and that this will need a sensible editing user interface, but I don't care about all of that, I just want to be able to query data out of this large data store.
We're working on it (mostly thinking right now, but correct design is 80% of the work, so...). Visualizations already have query capabilities (mainly because they have strong caching model embedded and because there are not too many of them and you need to create them so we can watch the load carefully). Other pages can gain them - probably via some kind of Lua functionality - as soon as we figure out what's the right way to do it, hopefully somewhere within the next year (no promise, but hopefully).
Thank you for this e-mail. It was informative.
Stas Malyshev wrote:
No, and there are tricky parts there. Consider https://www.wikidata.org/wiki/Q735712. Yes, Lex Luthor held the office of the President of the USA. In a fictional universe, of course. But the naive query - every Wikidata item where position held includes "President of the United States" - would return Lex Luthor as the president as legitimate as Abraham Lincoln. In fact, there are 79 US presidents judging by "position held" alone. So clearly, there need to be some limits. And those limits would be on case-by-case basis.
Sure, but I'm not really worried about potential false positives. I'm worried that we're building a giant write-only data store.
Right now the best way is use one of the list-maintaining bots I think.
This sucks. :-(
Unless you're talking about pulling a small set of values, in which case Lua/templates are probably the best venue.
I'm not sure what small means here. We have about 46 U.S. Presidents, is that small enough? Which Lua functions and templates could I use?
We're working on it (mostly thinking right now, but correct design is 80% of the work, so...). Visualizations already have query capabilities (mainly because they have strong caching model embedded and because there are not too many of them and you need to create them so we can watch the load carefully). Other pages can gain them - probably via some kind of Lua functionality - as soon as we figure out what's the right way to do it, hopefully somewhere within the next year (no promise, but hopefully).
Wikidata began in October 2012. I thought it might take till 2014 or even 2015 to get querying capability into a usable state, but we're now looking at potentially 2018? This really sucks. I think Wikidata may eventually have a seismic shift on wiki editing, but currently I don't see any reason to even contribute to it when it feels like putting data into a giant system that you can't really get back out. I love Magnus and I have a ton of respect for him, but I don't want anything to do with anything called Listeria. It continues to seem like querying is an afterthought for Wikidata and this continues to boggle my mind.
MZMcBride
Hi!
Sure, but I'm not really worried about potential false positives. I'm worried that we're building a giant write-only data store.
Fortunately, we are not doing that.
Unless you're talking about pulling a small set of values, in which case Lua/templates are probably the best venue.
I'm not sure what small means here. We have about 46 U.S. Presidents, is that small enough? Which Lua functions and templates could I use?
No, list of presidents is not small enough. Lua right now can fetch specific data from specific item. Which is OK if you know the item and what you're getting (e.g. infoboxes, etc.) but not good for lists of items, especially with complicated conditions. That use case currently needs external tools - like bots.
Wikidata began in October 2012. I thought it might take till 2014 or even 2015 to get querying capability into a usable state, but we're now looking
Please do not confuse your particular use case with querying not be usable at all. It is definitely usable and being used by many people for many things. Generating lists directly from wiki template is not supported yet, and we're working on it. I'm sorry that your use case is not supported and you're feeling disappointed. But we do have query capability and it can be used and is being used for many other things.
Of course, contributions - in any form, query development, code development, design, frontend, backend, data contributions, etc. - are always welcome.
to even contribute to it when it feels like putting data into a giant system that you can't really get back out. I love Magnus and I have a ton
Again, this is not correct - you can read data back out and there are several ways you can use query functionality for it right now. The way you want to do it is not supported - yet - but there are many other ways. Which we are constantly improving. But we can't do everything at once. Please be patient, please contribute with what you can, and we'll get there.
TL;DR: The ONLY practical solution today it to use Lua. This sucks,but it works and scale well [in WP sense] - hewiki uses it heavily in infoboxs - to show list of actors in movies, or musical band members etc.
Long version: Actually, specifically for list of presidents you don't need bot. Here is how to do it in Lua more or less (in pseudo code): local countryEntity = mw.wikibase.getEntity('Q30') --note: you can get the country from property/current entity to be generic local presidents= countryEntity:getBestStatements('P6') --note: you can get this as a parameter local output = '' for i, property in ipairs(propertyVals) do local propValue = property.mainsnak and property.mainsnak.datavalue -- parse it to the desired output... end
(A real world usage example: https://he.wikipedia.org/wiki/%D7%99%D7%97%D7%99%D7%93%D7%94:PropertyLink?us... in function: getProperty)
Why this is good: 1. It is the only practical way to query wikidata from Wikipedia. [bots aren't practical - 1. They are less accessible to common users. 2. Some use cases requires to run the query and update every 4/5 years when list of governors is updated] 2. It is generic enough to work in different countries and different lists 3. Users can easily use it with with syntax such as {{#invoke:LuaModule|listOf|Q30|P6}} or as templates, and are unaware to the implementation
Why it sucks: 1. Because it is ugly Lua code 2. This just moves the problem to Wikidata [have to maintain Q30.P6 using bots/humans instead of queries] 3. It is limited to simple lists (you can't have list of Republican presidents - because it requires additional filters and you don't want to create new property for it) 4. Internationalization - What if yi Wikipedia wants to create list of governors of some small country where there are no yi labels for the presidents? The list would be partially in yi partially in en - is this desired behavior? or they can show only presidents who have label in yi - but this would give partial data - is this the desired behavior? [Probably the correct solution is to do show the fallback labels in en, but add some tracking category for pages requires label translation or [translate me] links)
On Fri, Dec 16, 2016 at 7:35 AM, Stas Malyshev smalyshev@wikimedia.org wrote:
Hi!
Sure, but I'm not really worried about potential false positives. I'm worried that we're building a giant write-only data store.
Fortunately, we are not doing that.
Unless you're talking about pulling a small set of values, in which case Lua/templates are probably the best venue.
I'm not sure what small means here. We have about 46 U.S. Presidents, is that small enough? Which Lua functions and templates could I use?
No, list of presidents is not small enough. Lua right now can fetch specific data from specific item. Which is OK if you know the item and what you're getting (e.g. infoboxes, etc.) but not good for lists of items, especially with complicated conditions. That use case currently needs external tools - like bots.
Wikidata began in October 2012. I thought it might take till 2014 or even 2015 to get querying capability into a usable state, but we're now
looking
Please do not confuse your particular use case with querying not be usable at all. It is definitely usable and being used by many people for many things. Generating lists directly from wiki template is not supported yet, and we're working on it. I'm sorry that your use case is not supported and you're feeling disappointed. But we do have query capability and it can be used and is being used for many other things.
Of course, contributions - in any form, query development, code development, design, frontend, backend, data contributions, etc. - are always welcome.
to even contribute to it when it feels like putting data into a giant system that you can't really get back out. I love Magnus and I have a ton
Again, this is not correct - you can read data back out and there are several ways you can use query functionality for it right now. The way you want to do it is not supported - yet - but there are many other ways. Which we are constantly improving. But we can't do everything at once. Please be patient, please contribute with what you can, and we'll get there. -- Stas Malyshev smalyshev@wikimedia.org
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Hi!
Actually, specifically for list of presidents you don't need bot.
Yeah, you are right, I was thinking about going through query route, but if your list is contained in one property (like Q30/P6) then using Lua is just fine. It's not always the case (e.g. "list of all movies where Brad Pitt played"). But where it works it's definitely a good way to go.
- It is limited to simple lists (you can't have list of Republican
presidents - because it requires additional filters and you don't want to create new property for it)
Exactly. You probably could still do something in Lua, but that's pushing it already.
- Internationalization - What if yi Wikipedia wants to create list of
governors of some small country where there are no yi labels for the presidents? The list would be partially in yi partially in en - is this desired behavior? or they can show only presidents who have label in yi - but this would give partial data - is this the desired behavior? [Probably the correct solution is to do show the fallback labels in en, but add some tracking category for pages requires label translation or [translate me] links)
That sounds like a good idea :)
I strongly support "native" Wikipedia lists using Wikidata queries, and by that I mean proper SPARQL, not Lua hacks.
Listeria is used "in production", e.g. on Welsh (about 17.000 lists in articles, see https://tools.wmflabs.org/listeria/botstatus.php), but it was always intended as a proof of concept. It is also designed around WDQ and only later retrofitted for SPARQL, which explains some of its peculiarities.
It can handle ~23K lists per day, easily, without any caching. I believe (naively, perhaps) an extension would be feasible that renders "row templates" based on SPARQL queries. No Lua needs to be involved in this, or current Wikidata "fact transclusion". In a first iteration, it might not even have an automatic update mechanism: * Render some <wikidata ... /> construct based on SPARQL * Tag pages with such tags in the database, or even through categories * Have an external service/bot purge these pages on a regular basis; that would update the list without the need of editing the page * These automated updates could be staged by the time the SPARQL query required on the last update - <2sec once/day, >10sec once/week etc. * Have an "update now!" button (as I have on Listeria lists) that just links to "action=purge", for the impatient (instant gratification)
The Wikipedia setup wasn't always as heavily cached as it is today; it grew with usage. I believe we could do this for Wikidata-based lists as well, as the WMF would control the update cycles.
On Fri, Dec 16, 2016 at 7:30 AM Stas Malyshev smalyshev@wikimedia.org wrote:
Hi!
Actually, specifically for list of presidents you don't need bot.
Yeah, you are right, I was thinking about going through query route, but if your list is contained in one property (like Q30/P6) then using Lua is just fine. It's not always the case (e.g. "list of all movies where Brad Pitt played"). But where it works it's definitely a good way to go.
- It is limited to simple lists (you can't have list of Republican
presidents - because it requires additional filters and you don't want to create new property for it)
Exactly. You probably could still do something in Lua, but that's pushing it already.
- Internationalization - What if yi Wikipedia wants to create list of
governors of some small country where there are no yi labels for the presidents? The list would be partially in yi partially in en - is this desired behavior? or they can show only presidents who have label in yi - but this would give partial data - is this the desired behavior?
[Probably
the correct solution is to do show the fallback labels in en, but add
some
tracking category for pages requires label translation or [translate me] links)
That sounds like a good idea :)
Stas Malyshev smalyshev@wikimedia.org
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org