Thanks Mike - typically with an announcement of this much importance and with so many specifics, a meta page is created to reflect. that. Can you let us know if there is a meta page that exists that we can point folks to?
-Andrew
On Fri, Feb 4, 2022 at 3:20 PM Mike Pham mpham@wikimedia.org wrote:
Hi all,
As many of you already know, one of the Search team’s priorities this year is scaling Wikidata Query Service (WDQS). Specifically, this conversation has centered around the need to move off of the Blazegraph backend that WDQS currently uses.
As part of this process, we want to get input/feedback from our community of users, and better understand some of the use cases and needs you have. As mentioned in our Jan 2022 scaling update https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS-scaling-update-jan-2022, Andrea Westerinen https://wikitech.wikimedia.org/wiki/User:AndreaWest has joined our team as a Contract Graph Consultant, and this provides an opportunity to meet her (and others on the WMF Search team working on WDQS) and give us direct feedback about your needs.
There will be 2 feedback sessions (more information on each session below) that you are welcome and encouraged to join:
- *WDQS scaling community meeting 1/2: SPARQL query features*
Video call link: https://meet.google.com/beu-fxov-etm Or dial: (US) +1 413–341–4301 PIN: 108 765 815#
- Thursday, February 17 · 18:00 UTC
- *WDQS scaling community meeting 2/2: RDF store backend needs*
Video call link: https://meet.google.com/skc-enqb-bpr Or dial: (US) +1 601–803–2313 PIN: 499 480 133#
- Monday, February 21 · 18:00 UTC
The purpose of these meetings is primarily to facilitate meeting each other, and to gather requirements and use cases around WDQS — while this information will be used to plan future scaling, no decisions will be made during the meetings themselves.
While we have a rough outline of the topics we intend to cover in each meeting, we also welcome relevant feedback that may not be covered below, though we encourage and prioritize ideas that are also valuable to others. We ask that you please be mindful of allowing others to express their thoughts and perspectives, and helping facilitate a constructive conversation.
As always, thanks for your time, energy and patience, and look forward to seeing you in a couple of weeks!
Best,
Mike
Meeting details WDQS scaling community meeting 1/2: SPARQL query features
SPARQL is a power querying language, and is the endpoint to access information on Wikidata. The flexibility and power of SPARQL also makes it possible for WDQS to be strained from complex/computationally expensive queries, affecting all users. In considering how to balance the usability of SPARQL and limitations on it that can help service reliability, we want to have a better understanding of what SPARQL features you most frequently use and/or are most important to you, and what the frequency of use is.
The following list of features indicates most of the SPARQL features of interest, but is not exhaustive, and anything else that comes to mind is also valuable:
- Query forms (SELECT, ASK, DESCRIBE and/or CONSTRUCT)
- Queried entities
areas of science, … or is it varied?
- Is your focus primarily on people, places, scholarly articles,
- Query patterns (example queries would be appreciated)
that you know their values when you define the query)
- Do you have constant subjects, predicates or objects? (Meaning
in sequence or as alternatives, inverted predicates, etc.)?
- Do you use property paths (e.g., a series of properties connected
- Do you use FILTERs, OPTIONALs, UNIONs, …?
you use EXISTS, NOT EXISTS or MINUS? Do you use SPARQL functions (such as logical functions like if/and/or/…, string functions like CONCAT, date/time functions like year, …)?
- For FILTERS, do you use regex or mathematical functions? Do
- Do you use aggregations (such as GROUP BY)?
- Do you ORDER results?
- SERVICEs (such as labels, GAS or date processing)
- Federated endpoints (such as DBPedia, the Getty vocabularies, Lingua
Libre, …)
WDQS scaling community meeting 2/2: RDF store backend needs
In addition to SPARQL query features, we are interested in knowing more about what functionality is important to you from an RDF store and SPARQL endpoint. For example, many you reported in the August 2021 WDQS user survey that the 60 second timeout limit was a top priority. This meeting will be about discussing how scaling the backend engineering of WDQS can be most valuable to your interests and needs. Other possible topics (non-exhaustive) may include:
- update speeds
- instrumentation and monitoring capabilities
- query tuning
- custom SPARQL extensions
- geospatial support
- support for other query languages
- support for inference/reasoning
—
*Mike Pham* (he/him) Sr Product Manager, Search Wikimedia Foundation https://wikimediafoundation.org/
Wikidata mailing list -- wikidata@lists.wikimedia.org To unsubscribe send an email to wikidata-leave@lists.wikimedia.org