Hi!
In our quest to make teh GPU-equipped machines in analytics ever more useful, we are going to update the rocm software suite and driver on stat1005 and stat1008 to the latest version, 3.8.0.
Since this will necessitate a reboot, this is the early warning that on 2020-11-23 (Friday), I will update stat1005. Disruption will likely be less than an hour. In case the update breaks stuff, we will roll back to v3.3.0.
The update of stat1008 will happen next week, on 2020-11-27 Tuesday, and there will be a separate reminder for that on Monday.
I will send an all-clear message to these lists once the update is done. For more details on the process, see https://phabricator.wikimedia.org/T264408
As always, if there is anything out of order, don't hesitate to contact us.
Best, Tobias
Hi!
On Tue, 20 Oct 2020, Tobias Klausmann wrote:
In our quest to make teh GPU-equipped machines in analytics ever more useful, we are going to update the rocm software suite and driver on stat1005 and stat1008 to the latest version, 3.8.0.
Maintenance today revealed that the kernel module shipped with rocm (rock-dkms) is not fully compatible with the kernel version we use. I have put stat1005 back into the state it was in (using rocm33). The planned update of stat1008 is canceled/postponed until we can make the driver and our kernels work together.
If anything is broken on stat1005, let us know.
Best, Tobias
analytics-announce@lists.wikimedia.org