[Labs-l] labmon1001 downtime scheduled for 2016-06-23 from 15:00 GMT to 23:00 GMT

Rob Halsell rhalsell at wikimedia.org
Wed Jun 22 16:39:49 UTC 2016


My understanding (Yuvi would know better):

Anything that currently relies on labmon1001 will likely be experiencing
decreased performance over the last few weeks.  First the old server was
very heavily overloaded, as it only used two of the four SATA disks
installed.  We attempted to migrate it to use all 4 disks, but the chassis
used at that time started to throw errors, so we migrated to another system
with the 4 disks from the old system.  That is what is now running
labmon1001, but its still overloaded on I/O.

The existing system will have its essential data backed up, and then
restored to the new array of 4 SSDs (versus 4 sata).  As this setup will
then mirror our graphite systems in production, we expect to eliminate the
IO bottleneck.

If you have a service that is experiencing slowdowns and is not tied to
labmon1001, then it should be independently investigated, as its unlikely
to be related to this proposed work.

On Wed, Jun 22, 2016 at 9:11 AM, Maximilian Doerr <
maximilian.doerr at gmail.com> wrote:

> Is this the primary reason grafana is throwing up on me when trying to
> view the Cyberbot project?
>
> Cyberpower678
> English Wikipedia Account Creation Team
> ACC Mailing List Moderator
> Global User Renamer
>
> On Jun 22, 2016, at 11:26, Rob Halsell <rhalsell at wikimedia.org> wrote:
>
> Labs users,
>
> As many of you may recall, (mostly) Yuvi and (slightly) myself worked on
> labmon1001 a couple of weeks back.  Unfortunately, the work performed
> wasn't enough (as it still left the host with spinning disks) and the
> graphite service hammering them bogs down the server.  As a solution, we
> will be reinstalling the system on 2016-06-23 from 15:00 GMT to 23:00 GMT.
> I don't expect to use the majority of this window, since much of the data
> was migrated backup previously.  However, it is a possibility if we have to
> re-sync all the data (without differential).
>
> Details of this work can be viewed on
> https://phabricator.wikimedia.org/T137924.
>
> Once the SSDs are installed, their ability to handle the iops generated
> from graphite is expected to bring load levels on labmon1001 down to sane
> levels.
>
> Please let me know if there are any questions or concerns.  They can be
> raised via this email thread, or by comment on the linked phabricator task.
>
> Thanks,
>
> --
> Rob Halsell
> Operations Engineer
> Wikimedia Foundation, Inc.
> E-Mail: rhalsell at wikimedia.org
> Key fingerprint = CB1F C7E7 0FF8 5DB2 6820  9C7E 75ED 14C7
> *0245 D22A*Office: 415.839.6885 x6620
> Fax: 415.882.0495
>
> _______________________________________________
> Labs-l mailing list
> Labs-l at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/labs-l
>
>
> _______________________________________________
> Labs-l mailing list
> Labs-l at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/labs-l
>
>


-- 
Rob Halsell
Operations Engineer
Wikimedia Foundation, Inc.
E-Mail: rhalsell at wikimedia.org
Key fingerprint = CB1F C7E7 0FF8 5DB2 6820  9C7E 75ED 14C7
*0245 D22A*Office: 415.839.6885 x6620
Fax: 415.882.0495
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/labs-l/attachments/20160622/bad4cfac/attachment.html>


More information about the Labs-l mailing list