Thanks Mike - typically with an announcement of this much importance and
with so many specifics, a meta page is created to reflect. that. Can you
let us know if there is a meta page that exists that we can point folks to?
On Fri, Feb 4, 2022 at 3:20 PM Mike Pham <mpham(a)wikimedia.org> wrote:
As many of you already know, one of the Search team’s priorities this year
is scaling Wikidata Query Service (WDQS). Specifically, this conversation
has centered around the need to move off of the Blazegraph backend that
WDQS currently uses.
As part of this process, we want to get input/feedback from our community
of users, and better understand some of the use cases and needs you have.
As mentioned in our Jan 2022 scaling update
Andrea Westerinen <https://wikitech.wikimedia.org/wiki/User:AndreaWest>
has joined our team as a Contract Graph Consultant, and this provides an
opportunity to meet her (and others on the WMF Search team working on WDQS)
and give us direct feedback about your needs.
There will be 2 feedback sessions (more information on each session below)
that you are welcome and encouraged to join:
1. *WDQS scaling community meeting 1/2: SPARQL query features*
- Thursday, February 17 · 18:00 UTC
Video call link: https://meet.google.com/beu-fxov-etm
Or dial: (US) +1 413–341–4301 PIN: 108 765 815#
2. *WDQS scaling community meeting 2/2: RDF store backend needs*
- Monday, February 21 · 18:00 UTC
Video call link: https://meet.google.com/skc-enqb-bpr
Or dial: (US) +1 601–803–2313 PIN: 499 480 133#
The purpose of these meetings is primarily to facilitate meeting each
other, and to gather requirements and use cases around WDQS — while this
information will be used to plan future scaling, no decisions will be made
during the meetings themselves.
While we have a rough outline of the topics we intend to cover in each
meeting, we also welcome relevant feedback that may not be covered below,
though we encourage and prioritize ideas that are also valuable to others.
We ask that you please be mindful of allowing others to express their
thoughts and perspectives, and helping facilitate a constructive
As always, thanks for your time, energy and patience, and look forward to
seeing you in a couple of weeks!
Meeting details WDQS scaling community meeting 1/2: SPARQL query features
SPARQL is a power querying language, and is the endpoint to access
information on Wikidata. The flexibility and power of SPARQL also makes it
possible for WDQS to be strained from complex/computationally expensive
queries, affecting all users. In considering how to balance the usability
of SPARQL and limitations on it that can help service reliability, we want
to have a better understanding of what SPARQL features you most frequently
use and/or are most important to you, and what the frequency of use is.
The following list of features indicates most of the SPARQL features of
interest, but is not exhaustive, and anything else that comes to mind is
- Query forms (SELECT, ASK, DESCRIBE and/or CONSTRUCT)
- Queried entities
- Is your focus primarily on people, places, scholarly articles,
areas of science, … or is it varied?
- Query patterns (example queries would be appreciated)
- Do you have constant subjects, predicates or objects? (Meaning
that you know their values when you define the query)
- Do you use property paths (e.g., a series of properties connected
in sequence or as alternatives, inverted predicates, etc.)?
- Do you use FILTERs, OPTIONALs, UNIONs, …?
- For FILTERS, do you use regex or mathematical functions? Do
you use EXISTS, NOT EXISTS or MINUS? Do you use SPARQL functions (such as
logical functions like if/and/or/…, string functions like CONCAT, date/time
functions like year, …)?
- Do you use aggregations (such as GROUP BY)?
- Do you ORDER results?
- SERVICEs (such as labels, GAS or date processing)
- Federated endpoints (such as DBPedia, the Getty vocabularies, Lingua
WDQS scaling community meeting 2/2: RDF store backend needs
In addition to SPARQL query features, we are interested in knowing more
about what functionality is important to you from an RDF store and SPARQL
endpoint. For example, many you reported in the August 2021 WDQS user
survey that the 60 second timeout limit was a top priority. This meeting
will be about discussing how scaling the backend engineering of WDQS can be
most valuable to your interests and needs. Other possible topics
(non-exhaustive) may include:
- update speeds
- instrumentation and monitoring capabilities
- query tuning
- custom SPARQL extensions
- geospatial support
- support for other query languages
- support for inference/reasoning
*Mike Pham* (he/him)
Sr Product Manager, Search
Wikimedia Foundation <https://wikimediafoundation.org/>
Wikidata mailing list -- wikidata(a)lists.wikimedia.org
To unsubscribe send an email to wikidata-leave(a)lists.wikimedia.org
Author of The Wikipedia Revolution
US National Archives Citizen Archivist of the Year (2016)
Knight Foundation grant recipient - Wikipedia Space (2015)
Wikimedia DC - Outreach and GLAM
Previously: professor of journalism and communications, American
University, Columbia University, USC