Hello all,
It has been exciting to see the increased adoption and usage of the Wikidata Query Service (WDQS) usage in the past year. To support this growing demand, on 15 March 2021 the Search Platform team released a new Streaming Updater to a test server https://query-preview.wikidata.org for feedback before going to production on 15 April 2021 (pending any major blockers discovered during testing). Once in production, WDQS will become less of a bottleneck for Wikidata updates, and we’re looking forward to better facilitating Wikidata’s continued growth as a more complete knowledge graph.
Your relevant feedback on the following changes is important to us to ensure we continue to best support your needs while scaling up the service in production:
1. New Streaming Updater: [1] -
This improvement to the Updater will allow WDQS to better handle the volume of edits to Wikidata, improving data consistency and decreasing update latency: while the existing Updater fluctuates between 5–15 updates/sec (averaging 10 updates/sec), the new Updater will be able handle a throughput of 40–130 updates/sec (88 updates/sec on average). Without these performance improvements, edits to Wikidata were being throttled https://phabricator.wikimedia.org/T243701, approaching the point where they could become impossible. With the new Updater, edits to Wikidata will be on the whole more consistent and have less lag, reducing the WDQS bottleneck to improving Wikidata content. -
We don’t anticipate this to adversely affect workflows or usage, but it is a big update, and we would like you to let us know if you find any related bugs or problems so that we can properly address them. 2. Blank node skolemization: [2] -
To reliably use the new Streaming Updater to minimize the throttling of edits to Wikidata, skolemization of blank nodes was required, as detailed in the phabricator ticket. For more detail on why this was necessary, you can also refer to another attempt to design a “diff” format for RDF https://www.w3.org/2001/sw/wiki/TurtlePatch, where the solution suggested to handle blank nodes is also skolemization. We understand that this solution will unfortunately potentially introduce breaking changes to your usage of WDQS, RDF dumps, and Special:EntityData; however, given the severe risk of the edits to Wikidata becoming impossible, we felt this was the best course of action to take in the timeframe we had. We acknowledge that this approach has its shortcomings, however, and invite you to provide us with feedback on how we can improve future usage of Wikidata and WDQS while maintaining their scalability and reliability. -
From a user perspective of this change, (1) queries using isBlank() will need to be rewritten; (2) queries using isIRI/isURI will need to be verified; (3) WDQS results will no longer include blank nodes. If these changes affect your workflows, and/or you need to know how to modify your workflows to account for the blank node skolemization, please let us know what your specific use case is. -
For more detail on how to modify your workflows, including examples, please refer to the following page: https://www.mediawiki.org/wiki/Wikidata_Query_Service/Blank_Node_Skolemizati... 3. Constraint fetching [3] -
Constraints are a Wikibase concept that allows entities to be validated based on definable properties: i.e. all astronauts must be human. Ideally, constraint fetching would be used to ensure data quality for Wikidata edits. The reality is that the current implementation of constraints fetching is not meeting our production quality standards and was generating detrimental noise in our logs. -
As a result of the sub-par implementation, and the fact that the new Flink-based Streaming Updater doesn’t support it, current constraint fetching functionality will be disabled with the new Updater release, until we can expose constraint violations in a more production-ready way [4][5][6]. We recognize that even functionality that doesn’t meet our production quality standards is still potentially useful for some, and we would like to hear your feedback if you are adversely affected by this change.
We’re looking forward to these new changes improving WDQS, and your relevant feedback on these updates will help us make sure we can continue to support your needs. If you have any questions, issues or suggestions, feel free to reach out to us on the WDQS contact page https://www.wikidata.org/wiki/Wikidata:Contact_the_development_team/Query_Service_and_search#New_WDQS_Streaming_Updater_feedback .
original announcement on Wikidata Project Chat: https://www.wikidata.org/wiki/Wikidata:Project_chat#New_WDQS_Streaming_Updat...
[1] - https://phabricator.wikimedia.org/T244590 [2] - https://phabricator.wikimedia.org/T244341 [3] - https://phabricator.wikimedia.org/T274982 [4] - https://phabricator.wikimedia.org/T204024 [5] - https://phabricator.wikimedia.org/T201147 [6] - https://phabricator.wikimedia.org/T201150
—
Mike Pham (he/him) Sr Product Manager, Search Platform Wikimedia Foundation https://wikimediafoundation.org/