Hello all,
It has been exciting to see the increased adoption and usage of the
Wikidata Query Service (WDQS) usage in the past year. To support this
growing demand, on 15 March 2021 the Search Platform team released a new
Streaming Updater to a test server
https://query-preview.wikidata.org for
feedback before going to production on 15 April 2021 (pending any major
blockers discovered during testing). Once in production, WDQS will become
less of a bottleneck for Wikidata updates, and we’re looking forward to
better facilitating Wikidata’s continued growth as a more complete
knowledge graph.
Your relevant feedback on the following changes is important to us to
ensure we continue to best support your needs while scaling up the service
in production:
1. New Streaming Updater: [1]
-
This improvement to the Updater will allow WDQS to better handle the
volume of edits to Wikidata, improving data consistency and decreasing
update latency: while the existing Updater fluctuates between 5–15
updates/sec (averaging 10 updates/sec), the new Updater will be
able handle
a throughput of 40–130 updates/sec (88 updates/sec on average). Without
these performance improvements, edits to Wikidata were being throttled
<https://phabricator.wikimedia.org/T243701>, approaching the point
where they could become impossible. With the new Updater, edits
to Wikidata
will be on the whole more consistent and have less lag, reducing the WDQS
bottleneck to improving Wikidata content.
-
We don’t anticipate this to adversely affect workflows or usage, but
it is a big update, and we would like you to let us know if you find any
related bugs or problems so that we can properly address them.
2. Blank node skolemization: [2]
-
To reliably use the new Streaming Updater to minimize the throttling
of edits to Wikidata, skolemization of blank nodes was required, as
detailed in the phabricator ticket. For more detail on why this was
necessary, you can also refer to another attempt to design a “diff”
format for RDF <https://www.w3.org/2001/sw/wiki/TurtlePatch>, where
the solution suggested to handle blank nodes is also skolemization. We
understand that this solution will unfortunately potentially introduce
breaking changes to your usage of WDQS, RDF dumps, and
Special:EntityData;
however, given the severe risk of the edits to Wikidata becoming
impossible, we felt this was the best course of action to take in the
timeframe we had. We acknowledge that this approach has its shortcomings,
however, and invite you to provide us with feedback on how we can improve
future usage of Wikidata and WDQS while maintaining their scalability and
reliability.
-
From a user perspective of this change, (1) queries using isBlank()
will need to be rewritten; (2) queries using isIRI/isURI will need to be
verified; (3) WDQS results will no longer include blank nodes. If these
changes affect your workflows, and/or you need to know how to modify your
workflows to account for the blank node skolemization, please let us know
what your specific use case is.
-
For more detail on how to modify your workflows, including examples,
please refer to the following page:
https://www.mediawiki.org/wiki/Wikidata_Query_Service/Blank_Node_Skolemizat…
3. Constraint fetching [3]
-
Constraints are a Wikibase concept that allows entities to be
validated based on definable properties: i.e. all astronauts
must be human.
Ideally, constraint fetching would be used to ensure data quality for
Wikidata edits. The reality is that the current implementation of
constraints fetching is not meeting our production quality standards and
was generating detrimental noise in our logs.
-
As a result of the sub-par implementation, and the fact that the new
Flink-based Streaming Updater doesn’t support it, current constraint
fetching functionality will be disabled with the new Updater
release, until
we can expose constraint violations in a more production-ready way
[4][5][6]. We recognize that even functionality that doesn’t meet our
production quality standards is still potentially useful for some, and we
would like to hear your feedback if you are adversely affected by this
change.
We’re looking forward to these new changes improving WDQS, and your
relevant feedback on these updates will help us make sure we can continue
to support your needs. If you have any questions, issues or suggestions,
feel free to reach out to us on the WDQS contact page
<https://www.wikidata.org/wiki/Wikidata:Contact_the_development_team/Query_Service_and_search#New_WDQS_Streaming_Updater_feedback>
.
original announcement on Wikidata Project Chat:
https://www.wikidata.org/wiki/Wikidata:Project_chat#New_WDQS_Streaming_Upda…
[1] -
https://phabricator.wikimedia.org/T244590
[2] -
https://phabricator.wikimedia.org/T244341
[3] -
https://phabricator.wikimedia.org/T274982
[4] -
https://phabricator.wikimedia.org/T204024
[5] -
https://phabricator.wikimedia.org/T201147
[6] -
https://phabricator.wikimedia.org/T201150
—
Mike Pham (he/him)
Sr Product Manager, Search Platform
Wikimedia Foundation <https://wikimediafoundation.org/>