On Monday, December 3rd, 2018 at 1700 UTC, we will be rebooting one of the two dumps NFS servers (labstore1006.wikimedia.org http://labstore1006.wikimedia.org/). This should cause rising load issues briefly, but should be quick enough that failing over services is likely to not be helpful. We will be failing over the web service before that time and failing it back before rebooting the partner server (labstore1007.wikimedia.org http://labstore1007.wikimedia.org/) on Friday, December 7th at 1700 UTC. This should not interrupt services to dumps.wikimedia.org http://dumps.wikimedia.org/ (the site hosted on these systems) since that should be failed over to the non-rebooting partner.
Brooke Storm Operations Engineer Wikimedia Cloud Services bstorm@wikimedia.org mailto:bstorm@wikimedia.org IRC: bstorm_
We are beginning the first step of this shortly. Again, it may cause some NFS actions to fail briefly and some load rise on clients, but the condition should be brief and temporary.
Brooke Storm Operations Engineer Wikimedia Cloud Services bstorm@wikimedia.org mailto:bstorm@wikimedia.org IRC: bstorm_
On Nov 28, 2018, at 3:42 PM, Brooke Storm bstorm@wikimedia.org wrote:
On Monday, December 3rd, 2018 at 1700 UTC, we will be rebooting one of the two dumps NFS servers (labstore1006.wikimedia.org http://labstore1006.wikimedia.org/). This should cause rising load issues briefly, but should be quick enough that failing over services is likely to not be helpful. We will be failing over the web service before that time and failing it back before rebooting the partner server (labstore1007.wikimedia.org http://labstore1007.wikimedia.org/) on Friday, December 7th at 1700 UTC. This should not interrupt services to dumps.wikimedia.org http://dumps.wikimedia.org/ (the site hosted on these systems) since that should be failed over to the non-rebooting partner.
Brooke Storm Operations Engineer Wikimedia Cloud Services bstorm@wikimedia.org mailto:bstorm@wikimedia.org IRC: bstorm_
The first step of this is done and things look good. The next reboot of labstore1007 will be 2018-12-07 @ 1700 UTC. Please report related issues for the labstore1006, dumps NFS, reboot.
Brooke Storm Operations Engineer Wikimedia Cloud Services bstorm@wikimedia.org mailto:bstorm@wikimedia.org IRC: bstorm_
On Dec 3, 2018, at 9:59 AM, Brooke Storm bstorm@wikimedia.org wrote:
We are beginning the first step of this shortly. Again, it may cause some NFS actions to fail briefly and some load rise on clients, but the condition should be brief and temporary.
Brooke Storm Operations Engineer Wikimedia Cloud Services bstorm@wikimedia.org mailto:bstorm@wikimedia.org IRC: bstorm_
On Nov 28, 2018, at 3:42 PM, Brooke Storm <bstorm@wikimedia.org mailto:bstorm@wikimedia.org> wrote:
On Monday, December 3rd, 2018 at 1700 UTC, we will be rebooting one of the two dumps NFS servers (labstore1006.wikimedia.org http://labstore1006.wikimedia.org/). This should cause rising load issues briefly, but should be quick enough that failing over services is likely to not be helpful. We will be failing over the web service before that time and failing it back before rebooting the partner server (labstore1007.wikimedia.org http://labstore1007.wikimedia.org/) on Friday, December 7th at 1700 UTC. This should not interrupt services to dumps.wikimedia.org http://dumps.wikimedia.org/ (the site hosted on these systems) since that should be failed over to the non-rebooting partner.
Brooke Storm Operations Engineer Wikimedia Cloud Services bstorm@wikimedia.org mailto:bstorm@wikimedia.org IRC: bstorm_
Part two of this (labstore1007) will be starting in 20 min. Impact should be brief for dumps nfs only.
Brooke Storm Operations Engineer Wikimedia Cloud Services bstorm@wikimedia.org mailto:bstorm@wikimedia.org IRC: bstorm_
On Dec 3, 2018, at 10:36 AM, Brooke Storm bstorm@wikimedia.org wrote:
The first step of this is done and things look good. The next reboot of labstore1007 will be 2018-12-07 @ 1700 UTC. Please report related issues for the labstore1006, dumps NFS, reboot.
Brooke Storm Operations Engineer Wikimedia Cloud Services bstorm@wikimedia.org mailto:bstorm@wikimedia.org IRC: bstorm_
On Dec 3, 2018, at 9:59 AM, Brooke Storm <bstorm@wikimedia.org mailto:bstorm@wikimedia.org> wrote:
We are beginning the first step of this shortly. Again, it may cause some NFS actions to fail briefly and some load rise on clients, but the condition should be brief and temporary.
Brooke Storm Operations Engineer Wikimedia Cloud Services bstorm@wikimedia.org mailto:bstorm@wikimedia.org IRC: bstorm_
On Nov 28, 2018, at 3:42 PM, Brooke Storm <bstorm@wikimedia.org mailto:bstorm@wikimedia.org> wrote:
On Monday, December 3rd, 2018 at 1700 UTC, we will be rebooting one of the two dumps NFS servers (labstore1006.wikimedia.org http://labstore1006.wikimedia.org/). This should cause rising load issues briefly, but should be quick enough that failing over services is likely to not be helpful. We will be failing over the web service before that time and failing it back before rebooting the partner server (labstore1007.wikimedia.org http://labstore1007.wikimedia.org/) on Friday, December 7th at 1700 UTC. This should not interrupt services to dumps.wikimedia.org http://dumps.wikimedia.org/ (the site hosted on these systems) since that should be failed over to the non-rebooting partner.
Brooke Storm Operations Engineer Wikimedia Cloud Services bstorm@wikimedia.org mailto:bstorm@wikimedia.org IRC: bstorm_
This is done. Please report any issues around dumps nfs servers.
Brooke Storm Operations Engineer Wikimedia Cloud Services bstorm@wikimedia.org mailto:bstorm@wikimedia.org IRC: bstorm_
On Dec 7, 2018, at 9:41 AM, Brooke Storm bstorm@wikimedia.org wrote:
Part two of this (labstore1007) will be starting in 20 min. Impact should be brief for dumps nfs only.
Brooke Storm Operations Engineer Wikimedia Cloud Services bstorm@wikimedia.org mailto:bstorm@wikimedia.org IRC: bstorm_
On Dec 3, 2018, at 10:36 AM, Brooke Storm <bstorm@wikimedia.org mailto:bstorm@wikimedia.org> wrote:
The first step of this is done and things look good. The next reboot of labstore1007 will be 2018-12-07 @ 1700 UTC. Please report related issues for the labstore1006, dumps NFS, reboot.
Brooke Storm Operations Engineer Wikimedia Cloud Services bstorm@wikimedia.org mailto:bstorm@wikimedia.org IRC: bstorm_
On Dec 3, 2018, at 9:59 AM, Brooke Storm <bstorm@wikimedia.org mailto:bstorm@wikimedia.org> wrote:
We are beginning the first step of this shortly. Again, it may cause some NFS actions to fail briefly and some load rise on clients, but the condition should be brief and temporary.
Brooke Storm Operations Engineer Wikimedia Cloud Services bstorm@wikimedia.org mailto:bstorm@wikimedia.org IRC: bstorm_
On Nov 28, 2018, at 3:42 PM, Brooke Storm <bstorm@wikimedia.org mailto:bstorm@wikimedia.org> wrote:
On Monday, December 3rd, 2018 at 1700 UTC, we will be rebooting one of the two dumps NFS servers (labstore1006.wikimedia.org http://labstore1006.wikimedia.org/). This should cause rising load issues briefly, but should be quick enough that failing over services is likely to not be helpful. We will be failing over the web service before that time and failing it back before rebooting the partner server (labstore1007.wikimedia.org http://labstore1007.wikimedia.org/) on Friday, December 7th at 1700 UTC. This should not interrupt services to dumps.wikimedia.org http://dumps.wikimedia.org/ (the site hosted on these systems) since that should be failed over to the non-rebooting partner.
Brooke Storm Operations Engineer Wikimedia Cloud Services bstorm@wikimedia.org mailto:bstorm@wikimedia.org IRC: bstorm_
cloud-announce@lists.wikimedia.org