Analytics-announce January 2021

analytics-announce@lists.wikimedia.org

6 participants
4 discussions

Deprecation of the old Druid datasource definition (not Druid itself!) in Superset

by Luca Toscano

Hi everybody, as described in https://phabricator.wikimedia.org/T263972 the Apache Superset devs have deprecated the Druid datasource definitions, in favor of SQLAlchemy with the so called "Druid tables". I have worked with Product Analytics some months ago to migrate charts to the new format, but since then the usage of Superset grew a lot and more users started to create charts without using Druid tables (but using Druid datasources). Today I have disabled the Druid datasources in Superset, but charts and dashboards using them are still visible. The only downside is that when clicking on the definition of an old datasource, the user will get a HTTP 404. I added some documentation to help users in the migration: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Superset#Druid_dataso… https://wikitech.wikimedia.org/wiki/Analytics/Systems/Superset#Migrate_a_ch… The migration is really easy but a little boring, since every chart needs to be migrated manually with a simple procedure (see links above). The Analytics team is preparing the work to upgrade to Superser 1.0 and this is part of that process :) Please ping the Analytics team if you encounter any issue, or if you need some help in migrating over to Druid tables. Tracking task: https://phabricator.wikimedia.org/T263972 Thanks a lot for the patience, Luca (on behalf of the Analytics / Data Engineering team)

3 years, 2 months

New config for jupyterhub avoids the "double kinit" problem on stat100x

by Luca Toscano

Hi everybody, There is a new config on stat100x hosts for Jupyter that allows the tmp directory to be shared from notebooks and OS, so there is only one kerberos credential cache now. This means that if you have a valid ticket on stat100x then it will not be necessary to also kinit again in a jupyter terminal (the other way around holds true as well). In order to get the new config you'll need to shutdown and start again any running notebooks (not a simple restart), let us know if you encounter any issues. Thanks! Luca (on behalf of the Analytics / Data Engineering team)

3 years, 3 months

HDFS permission issues

by Razzi Abuissa

Hi all, We have updated HDFS permissions for our ticket https://phabricator.wikimedia.org/T270629, and the updated permissions have revealed issues that may affect Superset dashboards and some of the batch jobs. We are aware and working on resolving these issues currently. If you'd like to report an issue, comment on the ticket linked above or message us on #wikimedia-analytics on freenode. Regards, Razzi & Analytics

3 years, 3 months

HDFS default umask changed

by Andrew Otto

Hi all! tl;dr newly created files in HDFS will not be 'world' readable by default. I.e. you must be either the owner or in the file's group to read the file. Today we changed the default HDFS umask to 027, so that all new files and dirs will have either 640 (rw-r-—--) (for files) or 750 (rwx-r-x-—) for directories. We don't anticipate any problems, but if you encounter any please don't hesitate to let us know. You can always hdfs dfs -chmod <https://hadoop.apache.org/docs/r2.7.6/hadoop-project-dist/hadoop-common/Fil…> your files after you create them. See https://phabricator.wikimedia.org/T270629 for more info. Your friendly Hadoop operators, - Andrew & Razzi & Luca

3 years, 3 months

2024

2023

2022

2021

2020

Analytics-announce January 2021