Hi everybody,
as described in https://phabricator.wikimedia.org/T263972 the Apache
Superset devs have deprecated the Druid datasource definitions, in favor of
SQLAlchemy with the so called "Druid tables". I have worked with Product
Analytics some months ago to migrate charts to the new format, but since
then the usage of Superset grew a lot and more users started to create
charts without using Druid tables (but using Druid datasources). Today I
have disabled the Druid datasources in Superset, but charts and dashboards
using them are still visible. The only downside is that when clicking on
the definition of an old datasource, the user will get a HTTP 404.
I added some documentation to help users in the migration:
https://wikitech.wikimedia.org/wiki/Analytics/Systems/Superset#Druid_dataso…https://wikitech.wikimedia.org/wiki/Analytics/Systems/Superset#Migrate_a_ch…
The migration is really easy but a little boring, since every chart needs
to be migrated manually with a simple procedure (see links above). The
Analytics team is preparing the work to upgrade to Superser 1.0 and this is
part of that process :)
Please ping the Analytics team if you encounter any issue, or if you need
some help in migrating over to Druid tables.
Tracking task: https://phabricator.wikimedia.org/T263972
Thanks a lot for the patience,
Luca (on behalf of the Analytics / Data Engineering team)
Hi everybody,
There is a new config on stat100x hosts for Jupyter that allows the tmp
directory to be shared from notebooks and OS, so there is only one kerberos
credential cache now. This means that if you have a valid ticket on
stat100x then it will not be necessary to also kinit again in a jupyter
terminal (the other way around holds true as well).
In order to get the new config you'll need to shutdown and start again any
running notebooks (not a simple restart), let us know if you encounter any
issues.
Thanks!
Luca (on behalf of the Analytics / Data Engineering team)
Hi all,
We have updated HDFS permissions for our ticket
https://phabricator.wikimedia.org/T270629, and the updated permissions have
revealed issues that may affect Superset dashboards and some of the batch
jobs. We are aware and working on resolving these issues currently.
If you'd like to report an issue, comment on the ticket linked above or
message us on #wikimedia-analytics on freenode.
Regards,
Razzi & Analytics
Hi all!
tl;dr newly created files in HDFS will not be 'world' readable by default.
I.e. you must be either the owner or in the file's group to read the file.
Today we changed the default HDFS umask to 027, so that all new files and
dirs will have either 640 (rw-r-—--) (for files) or 750 (rwx-r-x-—) for
directories.
We don't anticipate any problems, but if you encounter any please don't
hesitate to let us know. You can always hdfs dfs -chmod
<https://hadoop.apache.org/docs/r2.7.6/hadoop-project-dist/hadoop-common/Fil…>
your files after you create them.
See https://phabricator.wikimedia.org/T270629 for more info.
Your friendly Hadoop operators,
- Andrew & Razzi & Luca