This is a reminder that the WMF Search team will begin data transfer for the new Streaming Updater today (11 Oct).
During the anticipated 7 days this data transfer process will happen (11–18 Oct), it is possible that some users will see inconsistent behavior or other bugs while querying. While we encourage these bugs to be filed, please note that it may be difficult for the Search team to accurately diagnose the source of these errors due to the nature of the process. We hope for a seamless transfer, of course, in which users will not notice any errors during the switchover.
Thanks for your patience!
On 16September, 2021 at 13:04:01, Mike Pham (firstname.lastname@example.org) wrote:
Thank you again for your all recent thoughts and feedback with regard to the recent Wikidata: Query Service (WDQS) scaling update Aug 2021, and for everyone who has responded to the WDQS user survey. As part of our ongoing work to scale WDQS, we will begin shipping the new Flink-based Streaming Updater from test servers to production on 11 October 2021, with the entire data transfer process expected to finish by 18 October 2021.
The primary goal of this new Streaming Updater is to reduce update lag, and throttling, of edits to Wikidata: going from an average of 10 edits/second to an average of 88 edits/second. We are excited that this is almost a 9x improvement in our ability to make sure that Wikidata Query Service has the freshest updates from Wikidata, a priority that many of you ranked highly in the recent survey. Additionally, the new update process will lessen the impact on Blazegraph itself by moving diff reconciliation away from the service. This update process will be more stable as a result, with more use cases like un/deletes handled correctly.
In order to minimize risks of failure during the rollout of the Streaming Updater, we will be moving individual servers over one at a time. During the anticipated 7 days this data transfer process will happen (11-18 Oct), it is possible that some users will see inconsistent behavior or other bugs while querying. While we encourage these bugs to be filed, please note that it may be difficult for the Search team to accurately diagnose the source of these errors due to the nature of the process. We hope for a seamless transfer, of course, in which users will not notice any errors during the switchover.
We previously announced the new Streaming Updater being released to test servers in March 2021; the changes announced there will now be effective for all WDQS users effective 18 Oct 2021.
The changes that allow the new Streaming Updater to reduce update lag comes with two notable changes, which have the potential to break current usage and workflows:
Blank nodes in Wikidata have been skolemized. From a user perspective, (1) queries using isBlank() will need to be rewritten; (2) queries using isIRI/isURI will need to be verified; (3) WDQS results will no longer include blank nodes.
Constraint Fetching -- specifically wikibase:hasViolationForConstraint -- will be temporarily disabled until we are able to expose constraint violations in a more production-ready way.
For more details on these changes, please refer again to our prior announcement.
We find it encouraging that the new Streaming Updater has not caused major relevant issues in the last six months while it has been on https://query-preview.wikidata.org/. We understand these changes may not be optimal for everyone. However, we believe the ability to greatly reduce Wikidata’s edit lag will be a beneficial improvement for all editors.
We’re excited to ship the new Flink-based Streaming Updater to production, and believe this is a significant step in scaling Wikidata and WDQS. As always, we encourage you to report technical problems and/or leave general comments/feedback in Project Chat.
WMF Search & WMDE—Mike Pham (he/him)Sr Product Manager, Search