Hi!
This Friday, 2020-10-30, I will be doing some
maintenance on stat1005 in the
EU/CET morning. During this, there will be disruption of everything there and
there will be multiple reboots. Afterwards, the machine will be running a newer
kernel (5.8) and updated GPU drivers/rocm library (3.8). Should the update
fail, or the subsequent tests show that workloads break, we will roll back to
4.19 and rocm33.
stat1005 is now running kernel 5.8.0 and rocm38. Note that you will have to
update tf-rocm to the latest version (2.3.1) to work on this machine.
If you have any questions or concerns, let us know.
Best,
Tobias
--
Tobias Klausmann, SRE, Wikimedia Foundation