Hello all!
Here are a few updates from Wikidata Query Service:
* We are getting close to full functional coverage of our Flink based Streaming Updater [1]. We are starting to work on productionizing it and having a deployment strategy. The current goal is deploy on top of Kubernetes. * We've been reviewing how we log queries and we've been adding some context to the logs. In particular, we are adding CPU load and query concurrency [2], with the hope that we can normalize our analysis: a query that takes time because the server is overload does not have the same meaning as a query that takes time because it is intrinsically expensive. * We've been exploring our assumption that expensive queries are more likely to be human generated queries (via the UI) than bots [3]. That assumption seems to be wrong. * We are looking into upgrading to JDK11. We are currently running on JDK8, we have some time before it is truly end of life. Blazegraph itself has a number of issues with JDK11. * We had a few issues with data reload on Wikimedia Commons Query Service. We are now doing those data reload without interruption, so future issues should not result in any downtime, but just delays in getting the new data. The data size of WCQS is growing faster than we expected. We are tentatively planning on working on a streaming updater for WCQS early 2021.
Have fun!
Guillaume
[1] https://phabricator.wikimedia.org/T244590 [2] https://phabricator.wikimedia.org/T261937 [3] https://phabricator.wikimedia.org/T261841#6532765
Flink is Java, which you are running in containers across K8S.
Ensure that your cgroups are managed well on those machines, etc. since JDK8 ignores them by default. Since you are still on JDK8 and not JDK9+ yet. https://jaxenter.com/nobody-puts-java-container-139373.html
Nice to see this coming along!
Thad https://www.linkedin.com/in/thadguidry/
On Mon, Oct 12, 2020 at 8:59 AM Guillaume Lederrey glederrey@wikimedia.org wrote:
Hello all!
Here are a few updates from Wikidata Query Service:
- We are getting close to full functional coverage of our Flink based
Streaming Updater [1]. We are starting to work on productionizing it and having a deployment strategy. The current goal is deploy on top of Kubernetes.
- We've been reviewing how we log queries and we've been adding some
context to the logs. In particular, we are adding CPU load and query concurrency [2], with the hope that we can normalize our analysis: a query that takes time because the server is overload does not have the same meaning as a query that takes time because it is intrinsically expensive.
- We've been exploring our assumption that expensive queries are more
likely to be human generated queries (via the UI) than bots [3]. That assumption seems to be wrong.
- We are looking into upgrading to JDK11. We are currently running on
JDK8, we have some time before it is truly end of life. Blazegraph itself has a number of issues with JDK11.
- We had a few issues with data reload on Wikimedia Commons Query Service.
We are now doing those data reload without interruption, so future issues should not result in any downtime, but just delays in getting the new data. The data size of WCQS is growing faster than we expected. We are tentatively planning on working on a streaming updater for WCQS early 2021.
Have fun!
Guillaume
[1] https://phabricator.wikimedia.org/T244590 [2] https://phabricator.wikimedia.org/T261937 [3] https://phabricator.wikimedia.org/T261841#6532765
-- Guillaume Lederrey Engineering Manager, Search Platform Wikimedia Foundation UTC+1 / CET _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Indeed! Here is a blog post from when I hit into this issue with the WDQS updater before. https://addshore.com/2020/05/reducing-java-jvm-memory-usage-in-containers-an...
On Mon, 12 Oct 2020 at 17:04, Thad Guidry thadguidry@gmail.com wrote:
Flink is Java, which you are running in containers across K8S.
Ensure that your cgroups are managed well on those machines, etc. since JDK8 ignores them by default. Since you are still on JDK8 and not JDK9+ yet. https://jaxenter.com/nobody-puts-java-container-139373.html
Nice to see this coming along!
Thad https://www.linkedin.com/in/thadguidry/
On Mon, Oct 12, 2020 at 8:59 AM Guillaume Lederrey < glederrey@wikimedia.org> wrote:
Hello all!
Here are a few updates from Wikidata Query Service:
- We are getting close to full functional coverage of our Flink based
Streaming Updater [1]. We are starting to work on productionizing it and having a deployment strategy. The current goal is deploy on top of Kubernetes.
- We've been reviewing how we log queries and we've been adding some
context to the logs. In particular, we are adding CPU load and query concurrency [2], with the hope that we can normalize our analysis: a query that takes time because the server is overload does not have the same meaning as a query that takes time because it is intrinsically expensive.
- We've been exploring our assumption that expensive queries are more
likely to be human generated queries (via the UI) than bots [3]. That assumption seems to be wrong.
- We are looking into upgrading to JDK11. We are currently running on
JDK8, we have some time before it is truly end of life. Blazegraph itself has a number of issues with JDK11.
- We had a few issues with data reload on Wikimedia Commons Query
Service. We are now doing those data reload without interruption, so future issues should not result in any downtime, but just delays in getting the new data. The data size of WCQS is growing faster than we expected. We are tentatively planning on working on a streaming updater for WCQS early 2021.
Have fun!
Guillaume
[1] https://phabricator.wikimedia.org/T244590 [2] https://phabricator.wikimedia.org/T261937 [3] https://phabricator.wikimedia.org/T261841#6532765
-- Guillaume Lederrey Engineering Manager, Search Platform Wikimedia Foundation UTC+1 / CET _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata