Hello all!

Here is a summary of what the Search Platform team is doing around WDQS:

* The database responsible for unit conversions [7] has been updated on Friday Jan 29. It means that entities served from WDQS and updated since this date will use the new conversion data for normalized quantities. The WDQS database will be fully reloaded this month [8] so that all entities will be coherent with the new conversion data.
* Now that we have the full functional coverage on the Flink based WDQS Streaming Updater [1], we've done some more testing, and as expected we found a few bugs and are correcting them.
* Exposing a test server [2] to gather feedback both on this new Flink based Streaming Updater and on the long standing issue of solemnization of blank nodes. We'll make an announcement when ready.
* Architecture review of the new Flink based Streaming Updater with Ververica (the company behind Flink). We will probably uncover a few more things that need to be improved.
* Productionizing the new Flink based Streaming Updater [8].
* Manual review of a sample of queries to WDQS. We learned a few things:
    * Human intuition is not good at predicting which queries are expensive
    * We have a large scope of very different queries / use cases, larger than we expected
    * Most of the request we've seen seem to be useful and valuable
* More in depth analysis and categorization of WDQS traffic [6]:
    * Instead of focusing on a way to provide more performant solutions for expensive queries that we see on WDQS, this analysis focuses on the query groups that we see the most, even if they are already efficient.
    * One key finding is that the top 90 query groups represent more than 80% of the queries we serve. Those queries are mostly "simple" queries: only using the truthy graph, only doing a very limited number of hops in the graph, etc... This opens the possibility to create a service that is scalable and efficient for those classes of queries.
    * This is very early work, we don't know yet what this service could look like or if it is even feasible to create it. But it is an interesting new approach in our problem space.
    * The analysis is a bit raw, feel free to ask clarifying questions, I'll route them to the appropriate person.
* Search Platform Office Hours are happening today (16:00-17:00 GMT / 08:00-09:00 PST / 11:00-12:00 EST / 17:00-18:00 CET) [9]. Feel free to join if you have any additional questions, or just want to chat with the team!

  Have fun!

      Guillaume


[1] https://phabricator.wikimedia.org/T244590
[2] https://phabricator.wikimedia.org/T266470
[3] https://phabricator.wikimedia.org/T244341
[4] https://phabricator.wikimedia.org/T264006
[5] https://www.wikidata.org/wiki/Wikidata:REST_API_feedback_round
[6] https://wikitech.wikimedia.org/wiki/User:Joal/WDQS_Queries_Analysis
[7] https://phabricator.wikimedia.org/T267644
[8] https://phabricator.wikimedia.org/T267927
[9] https://etherpad.wikimedia.org/p/Search_Platform_Office_Hours

--
Guillaume Lederrey (he/him)
Engineering Manager
Wikimedia Foundation