Hi everybody,

I recently learned from Superset upstream that the way that we configure Druid datasources is considered "legacy", and in the future support will likely decrease over time. They suggested using Druid SQL capabilities configuring it as Database, and mapping every Druid datasource as a table.

What does this mean in practice? Several things:

1) We should stop creating Druid datasources (Sources -> Druid Datasources) in favor of Druid tables. I added some info in https://wikitech.wikimedia.org/wiki/Analytics/Systems/Superset#Druid_datasources_vs_Druid_tables. The nice thing is that in this way we'll not be required to specify all the dimensions/attributes.

2) We should test charts with Druid tables and see if we find bugs. The can come from our Druid version (0.12.3, last upstream is 0.18, we'll upgrade asap but requires time) or from Superset itself.

3) Migrate all charts using Druid Datasources to Druid tables (I'll follow up with upstream to see if there is a way to script this and avoid a manual fix for every chart).

The last point is a big request I know, but it is better if we start doing it now rather than later in my opinion. I have recently started to test Superset release candidates when upstream releases them, but I wasn't able to stop the release of 0.36.0 with a bug report (https://github.com/apache/incubator-superset/issues/9468) since the code was considered "legacy" so not enough to stop a release. As you can imagine, keeping up with Superset becomes a little bit more difficult and I'd prefer to follow best (supported) practices from upstream where possible :)

There is a task opened for this: https://phabricator.wikimedia.org/T249681. If somebody could help me testing Druid tables it would be really great!

Thanks in advance,

Luca