Thanks Mike - typically with an announcement of this much importance and with so many specifics, a meta page is created to reflect. that. Can you let us know if there is a meta page that exists that we can point folks to? 

-Andrew


On Fri, Feb 4, 2022 at 3:20 PM Mike Pham <mpham@wikimedia.org> wrote:

Hi all,

As many of you already know, one of the Search team’s priorities this year is scaling Wikidata Query Service (WDQS). Specifically, this conversation has centered around the need to move off of the Blazegraph backend that WDQS currently uses.

As part of this process, we want to get input/feedback from our community of users, and better understand some of the use cases and needs you have. As mentioned in our Jan 2022 scaling update, Andrea Westerinen has joined our team as a Contract Graph Consultant, and this provides an opportunity to meet her (and others on the WMF Search team working on WDQS) and give us direct feedback about your needs.

There will be 2 feedback sessions (more information on each session below) that you are welcome and encouraged to join:

  1. WDQS scaling community meeting 1/2: SPARQL query features
  2. WDQS scaling community meeting 2/2: RDF store backend needs

The purpose of these meetings is primarily to facilitate meeting each other, and to gather requirements and use cases around WDQS — while this information will be used to plan future scaling, no decisions will be made during the meetings themselves.

While we have a rough outline of the topics we intend to cover in each meeting, we also welcome relevant feedback that may not be covered below, though we encourage and prioritize ideas that are also valuable to others. We ask that you please be mindful of allowing others to express their thoughts and perspectives, and helping facilitate a constructive conversation.

As always, thanks for your time, energy and patience, and look forward to seeing you in a couple of weeks!

Best,

Mike


Meeting details

WDQS scaling community meeting 1/2: SPARQL query features

SPARQL is a power querying language, and is the endpoint to access information on Wikidata. The flexibility and power of SPARQL also makes it possible for WDQS to be strained from complex/computationally expensive queries, affecting all users. In considering how to balance the usability of SPARQL and limitations on it that can help service reliability, we want to have a better understanding of what SPARQL features you most frequently use and/or are most important to you, and what the frequency of use is.

The following list of features indicates most of the SPARQL features of interest, but is not exhaustive, and anything else that comes to mind is also valuable:

  • Query forms (SELECT, ASK, DESCRIBE and/or CONSTRUCT)
  • Queried entities
    • Is your focus primarily on people, places, scholarly articles, areas of science, … or is it varied?
  • Query patterns (example queries would be appreciated)
    • Do you have constant subjects, predicates or objects? (Meaning that you know their values when you define the query)
    • Do you use property paths (e.g., a series of properties connected in sequence or as alternatives, inverted predicates, etc.)?
    • Do you use FILTERs, OPTIONALs, UNIONs, …?
      • For FILTERS, do you use regex or mathematical functions? Do you use EXISTS, NOT EXISTS or MINUS? Do you use SPARQL functions (such as logical functions like if/and/or/…, string functions like CONCAT, date/time functions like year, …)?
    • Do you use aggregations (such as GROUP BY)?
    • Do you ORDER results?
  • SERVICEs (such as labels, GAS or date processing)
  • Federated endpoints (such as DBPedia, the Getty vocabularies, Lingua Libre, …)
WDQS scaling community meeting 2/2: RDF store backend needs

In addition to SPARQL query features, we are interested in knowing more about what functionality is important to you from an RDF store and SPARQL endpoint. For example, many you reported in the August 2021 WDQS user survey that the 60 second timeout limit was a top priority. This meeting will be about discussing how scaling the backend engineering of WDQS can be most valuable to your interests and needs. Other possible topics (non-exhaustive) may include:

  • update speeds
  • instrumentation and monitoring capabilities
  • query tuning
  • custom SPARQL extensions
  • geospatial support
  • support for other query languages
  • support for inference/reasoning





Mike Pham (he/him)
Sr Product Manager, Search

_______________________________________________
Wikidata mailing list -- wikidata@lists.wikimedia.org
To unsubscribe send an email to wikidata-leave@lists.wikimedia.org


--
-Andrew Lih
Author of The Wikipedia Revolution
US National Archives Citizen Archivist of the Year (2016)
Knight Foundation grant recipient - Wikipedia Space (2015)
Wikimedia DC - Outreach and GLAM
Previously: professor of journalism and communications, American University, Columbia University, USC
---
Email: andrew@andrewlih.com
WEB: https://muckrack.com/fuzheado
PROJECT: Wikipedia Space: http://en.wikipedia.org/wiki/WP:WPSPACE