[Analytics] Superset and Druid datasources (if you use them please read)

29 Apr 2020

Hi everybody,

I recently learned from Superset upstream that the way that we configure
Druid datasources is considered "legacy", and in the future support will
likely decrease over time. They suggested using Druid SQL capabilities
configuring it as Database, and mapping every Druid datasource as a table.

What does this mean in practice? Several things:

1) We should stop creating Druid datasources (Sources -> Druid Datasources)
in favor of Druid tables. I added some info in
https://wikitech.wikimedia.org/wiki/Analytics/Systems/Superset#Druid_dataso….
The nice thing is that in this way we'll not be required to specify all the
dimensions/attributes.
2) We should test charts with Druid tables and see if we find bugs. The can
come from our Druid version (0.12.3, last upstream is 0.18, we'll upgrade
asap but requires time) or from Superset itself.
3) Migrate all charts using Druid Datasources to Druid tables (I'll follow
up with upstream to see if there is a way to script this and avoid a manual
fix for every chart).

The last point is a big request I know, but it is better if we start doing
it now rather than later in my opinion. I have recently started to test
Superset release candidates when upstream releases them, but I wasn't able
to stop the release of 0.36.0 with a bug report (
https://github.com/apache/incubator-superset/issues/9468) since the code
was considered "legacy" so not enough to stop a release. As you can
imagine, keeping up with Superset becomes a little bit more difficult and
I'd prefer to follow best (supported) practices from upstream where
possible :)

There is a task opened for this: https://phabricator.wikimedia.org/T249681.
If somebody could help me testing Druid tables it would be really great!

Thanks in advance,

Luca

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

[Analytics] Superset and Druid datasources (if you use them please read)