Hi!
TL;DR: The Analytics team added automatic failover for the HDFS Namenode. A
new daemon is running on the analytics1001/1002 hosts called
hadoop-hdfs-zkfc (port 8019) responsible to talk with Zookeeper and execute
periodical health checks. Monitoring and Ferm rules has been added.
More info:
1)
https://phabricator.wikimedia.org/T129838
2)
https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/HDFSH…
3) Monitoring and Ferm rules added:
https://gerrit.wikimedia.org/r/#/c/279408/
Let me know if you have any questions!
Luca