Hello all,
Thank you again for your all recent thoughts and feedback with regard to the recent Wikidata: Query Service (WDQS) scaling update Aug 2021 https://www.wikidata.org/wiki/Wikidata:Query_Service_scaling_update_Aug_2021, and for everyone who has responded to the WDQS user survey https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2021/08#Wikidata_Query_Service_%28WDQS%29_User_Survey_2021. As part of our ongoing work to scale WDQS, we will begin shipping the new Flink-based Streaming Updater from test servers to production on 11 October 2021, with the entire data transfer process expected to finish by 18 October 2021.
The primary goal of this new Streaming Updater is to reduce update lag, and throttling, of edits to Wikidata: going from an average of 10 edits/second to an average of 88 edits/second. We are excited that this is almost a 9x improvement in our ability to make sure that Wikidata Query Service has the freshest updates from Wikidata, a priority that many of you ranked highly in the recent survey. Additionally, the new update process will lessen the impact on Blazegraph itself by moving diff reconciliation away from the service. This update process will be more stable as a result, with more use cases like un/deletes handled correctly.
In order to minimize risks of failure during the rollout of the Streaming Updater, we will be moving individual servers over one at a time. During the anticipated 7 days this data transfer process will happen (11-18 Oct), it is possible that some users will see inconsistent behavior or other bugs while querying. While we encourage these bugs to be filed, please note that it may be difficult for the Search team to accurately diagnose the source of these errors due to the nature of the process. We hope for a seamless transfer, of course, in which users will not notice any errors during the switchover.
We previously announced the new Streaming Updater being released to test servers in March 2021 https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2021/03#New_WDQS_Streaming_Updater_now_available_on_pre-production_test_server_for_feedback; the changes announced there will now be effective for all WDQS users effective 18 Oct 2021.
The changes that allow the new Streaming Updater to reduce update lag comes with two notable changes, which have the potential to break current usage and workflows:
1.
Blank nodes in Wikidata have been skolemized https://www.mediawiki.org/wiki/Wikidata_Query_Service/Blank_Node_Skolemization. From a user perspective, (1) queries using isBlank() will need to be rewritten; (2) queries using isIRI/isURI will need to be verified; (3) WDQS results will no longer include blank nodes. 2.
Constraint Fetching -- specifically wikibase:hasViolationForConstraint -- will be temporarily disabled until we are able to expose constraint violations in a more production-ready way https://phabricator.wikimedia.org/T192565.
For more details on these changes, please refer again to our prior announcement https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2021/03#New_WDQS_Streaming_Updater_now_available_on_pre-production_test_server_for_feedback .
We find it encouraging that the new Streaming Updater has not caused major relevant issues in the last six months while it has been on https://query-preview.wikidata.org/. We understand these changes may not be optimal for everyone. However, we believe the ability to greatly reduce Wikidata’s edit lag will be a beneficial improvement for all editors.
We’re excited to ship the new Flink-based Streaming Updater to production, and believe this is a significant step in scaling Wikidata and WDQS. As always, we encourage you to report technical problems https://www.wikidata.org/wiki/Wikidata:Report_a_technical_problem/WDQS_and_Search#New_WDQS_Streaming_Updater_feedback and/or leave general comments/feedback in Project Chat https://www.wikidata.org/wiki/Wikidata:Project_chat.
Best,
WMF Search & WMDE
—
*Mike Pham* (he/him) Sr Product Manager, Search Wikimedia Foundation https://wikimediafoundation.org/