Hi,
Sorry, I responded to Shay on Slack but should have followed up here as well.
In the end, the reported issue was due to the fact that queries run exclusively on the superset-next instance were lost when we synced the production database to the staging environment (to replicate a production-looking environment in staging).
This is a good reminder that superset data is synced from production to staging when SREs run experiments, meaning that charts, datasets, saved queries, etc, should really be saved and defined in the production environment. As stated in the wiki https://wikitech.wikimedia.org/wiki/Analytics/Systems/Superset, "[superset-next] is used to test new versions of and features of Superset before they are promoted to the primary instance"
Thank you all!
Balthazar
On Mon, Mar 4, 2024 at 3:18 PM Shay Nowick snowick@wikimedia.org wrote:
Hi- I made a note in Slack but here is probably a better place - iit looks like I am missing Query History and Saved Queries associated with my login. Last query date in my history is 2024-01-30 on the Query History tab, I know I ran queries last week and all through February. Is there a way to restore that data? Not a rush since you're all on offsite but wanted to note what I'm seeing.
Thanks- Shay Nowick (she/her) Sr. Data Scientist Wikimedia Foundation
On Mon, Mar 4, 2024 at 4:33 AM Balthazar Rouberol brouberol@wikimedia.org wrote:
Hello,
As of now, https://superset-next.wikimedia.org is running on Kubernetes. We also introduced changes and new features as we migrated it over, which should lead to reduce manual maintenance as well as faster browsing.
- Logging now goes through our OIDC server (CAS)
- This allows us to map your LDAP groups to Superset roles. Without going
into too much detail, everyone should get Alpha and sql_lab permissions, which are automatically managed by Superset. Gone should be the days of hand-managing permissions.
- We have enabled data caching (caching of charts and dashboard data)
while making sure that the cached data is scoped per user. This means that if a given user cannot access some data from, say, HDFS, they won't be able to access the cached version of it either. This should lead to a faster browsing, as data computed via expensive queries should be fetched much faster the next time.
As the Data Platform SRE team will be in San Fransisco next week, we plan to perform the https://superset.wikimedia.org migration the week after (March 18th-22nd). If you could take the time to log into superset-next and make sure everything looks up to snuff, that would be very helpful to us.
Kind regards Balthazar
On Mon, Feb 26, 2024 at 11:57 AM Ben Tullis btullis@wikimedia.org wrote:
Hello (especially to Superset users),
As you may know, the Data Platform SRE team is currently working on migrating the Analytics Superset instances to Kubernetes (under ticket T347710 https://phabricator.wikimedia.org/T347710) and, happily, I can report that we are making good progress.
This is just a courtesy email to let you know that we plan to switch our staging instance (superset-next.wikimedia.org) to over to Kubernetes over the next day or two. This is unlikely to affect anyone's work at the moment, given that both the staging and production instances of Superset have been on version 3.1.0 for a while.
However, given that this staging instance is available for you to use at any time, we thought it best to let you know that we are currently working on it and that it may be in a state of flux for a while.
Once it is stable on Kubernetes, we may well contact you again and ask you kindly to test superset-next for us and report your findings. At the moment though, we're just working on the transition itself so there won't be much for you to test.
As ever, if you have any queries or concerns, please don't hesitate to let us know.
Kind regards, Ben -- *Ben Tullis* (he/him) Senior Site Reliability Engineer Wikimedia Foundation https://wikimediafoundation.org/