Re: [Wikitech-l] Arbitrary Wikidata querying

13 Dec 2016


      Hi!
...
If I wanted to make a page on the English Wikipedia using wikitext called
"List of United States presidents" that dynamically embeds information
from https://www.wikidata.org/wiki/Q23 and
https://www.wikidata.org/wiki/Q11806 and other similar items, is this
currently possible? I consider this to be arbitrary Wikidata querying, but
if that's not the correct term, please let me know what to call it.
So this is kind of can of worms which we I guess eventually have to
open, but very carefully. So I want to state my _current_ opinion on the
matters - please note, it can change at any time due to changing
circumstances, persuasion, experience, revelation, etc.
1. Technically, anything that can access a web-service and speak JSON,
can talk to SPARQL server. So, in theory, making some way to do this,
*in theory*, would not be very hard. But - please keep reading.
2. I am very apprehensive about having direct link between any wiki
pages and SPARQL server without heavy caching and rate limiting in
between. We don't have super-strong setup there and I'm afraid making
such link would just knock our setup over, especially if people start
putting queries into frequently-used templates.
3. We have number of bot setups (Listeria etc.) which can auto-update
lists from SPARQL periodically. This works reasonably well (excepting
occasional timeout on tricky queries, etc.) and does not require
requesting the info too frequently.
4. If we want more direct page-to-SPARQL-to-page interface, we need to
think about storing/caching data, and not for 5 minutes like it's cached
now but for much longer time, probably in storage other than varnish.
Ideally, that storage would be more of a persistent store than a cache -
i.e. it would always (or nearly always) be available but periodically
updated. Kind of like bots mentioned above but more generic. I don't
have any more design for it beyond that but that's I think the direction
we should be looking into.
...
A more advanced form of this Wikidata querying would be dynamically
generating a list of presidents of the United States by finding every
Wikidata item where position held includes "President of the United
States". Is this currently possible on-wiki or via wikitext?
No, and there are tricky parts there. Consider
https://www.wikidata.org/wiki/Q735712. Yes, Lex Luthor held the office
of the President of the USA. In a fictional universe, of course. But the
naive query - every
Wikidata item where position held includes "President of the United
States" - would return Lex Luthor as the president as legitimate as
Abraham Lincoln. In fact, there are 79 US presidents judging by
"position held" alone. So clearly, there need to be some limits. And
those limits would be on case-by-case basis.
...
If either of these querying capabilities are possible, how do I do them?
I don't understand how to query Wikidata in a useful way and I find this
frustrating. Since 2012, we've been putting a lot of data into Wikidata,
but I want to programmatically extract some of this data and use it in my
Wikipedia editing. How do I do this?
Right now the best way is use one of the list-maintaining bots I think.
Unless you're talking about pulling a small set of values, in which case
Lua/templates are probably the best venue.
...
If these querying capabilities are not currently possible, when might they
be? I understand that cache invalidation is difficult and that this will
need a sensible editing user interface, but I don't care about all of
that, I just want to be able to query data out of this large data store.
We're working on it (mostly thinking right now, but correct design is
80% of the work, so...). Visualizations already have query capabilities
(mainly because they have strong caching model embedded and because
there are not too many of them and you need to create them so we can
watch the load carefully). Other pages can gain them - probably via some
kind of Lua functionality - as soon as we figure out what's the right
way to do it, hopefully somewhere within the next year (no promise, but
hopefully).
-- 
Stas Malyshev
smalyshev@wikimedia.org

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Arbitrary Wikidata querying