Hello all!
Here are a few updates from Wikidata Query Service:
* We are getting close to full functional coverage of our Flink based Streaming Updater [1]. We are starting to work on productionizing it and having a deployment strategy. The current goal is deploy on top of Kubernetes. * We've been reviewing how we log queries and we've been adding some context to the logs. In particular, we are adding CPU load and query concurrency [2], with the hope that we can normalize our analysis: a query that takes time because the server is overload does not have the same meaning as a query that takes time because it is intrinsically expensive. * We've been exploring our assumption that expensive queries are more likely to be human generated queries (via the UI) than bots [3]. That assumption seems to be wrong. * We are looking into upgrading to JDK11. We are currently running on JDK8, we have some time before it is truly end of life. Blazegraph itself has a number of issues with JDK11. * We had a few issues with data reload on Wikimedia Commons Query Service. We are now doing those data reload without interruption, so future issues should not result in any downtime, but just delays in getting the new data. The data size of WCQS is growing faster than we expected. We are tentatively planning on working on a streaming updater for WCQS early 2021.
Have fun!
Guillaume
[1] https://phabricator.wikimedia.org/T244590 [2] https://phabricator.wikimedia.org/T261937 [3] https://phabricator.wikimedia.org/T261841#6532765