[Labs-admin] Labs capacity planning 2017-08-30

Andrew Bogott abogott at wikimedia.org
Wed Aug 30 18:43:10 UTC 2017


Attached are two sheets, one from the last time I did this (a bit less 
than a year ago) and one from today.

The updated report includes new numbers reflecting overprovisioned disk 
space: physical usage is typically much less than committed usage.  The 
'Post-commit free (Gb)' column represents the amount of disk space that 
would be free in the unlikely event that all users simultaneously 
consume all of their allocated disk space all at once.

The totals are a bit funny because one whole server (labvirt1015) is 
currently offline.  The numbers in the report assume that it will be 
fixed and brought back on-line as a new, empty box.

== Current usage ==

CPU availability is adequate, although poorly balanced.  Recent changes 
to the scheduler should improve balancing going forwards.

RAM usage is OK, with about 46% free ram cluster-wide.  Again, this is 
poorly balanced, with one server running with only 27% free.

Actual disk space usage is fine, we're running with about 63% free space 
and have 34Tb of headroom.  Some individual hosts are seriously 
overcommitted but don't show any near-term danger of overfilling.

== Trends ==

We've increased RAM and CPU capacity in lockstep with growth in usage.  
The % of free ram and CPU are quite similar to last year.

We've increased our storage capacity more rapidly than current growth in 
disk usage.  This is on purpose, to anticipate several large-disk-usage 
cases which have not yet come online.

== Action items ==

This always includes these two points:

- Implement central logging services to get giant logfiles out of 
instance storage
- We really need some kind of storage solution that is neither NFS or 
instance-local.  Cinder or similar.

Otherwise, we're in decent shape.  If we decide to worry more about the 
overcommits on disk then some rebalancing to the new, larger servers is 
in order.  We should also review the purchase dates of the low-number 
servers and figure out if we need to rotate any of them out of service.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: labvirt_usage_2016_10.ods
Type: application/vnd.oasis.opendocument.spreadsheet
Size: 18358 bytes
Desc: not available
URL: <https://lists.wikimedia.org/pipermail/labs-admin/attachments/20170830/8d9e8d12/attachment-0002.ods>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: labvirt_usage_2017_08.ods
Type: application/vnd.oasis.opendocument.spreadsheet
Size: 21437 bytes
Desc: not available
URL: <https://lists.wikimedia.org/pipermail/labs-admin/attachments/20170830/8d9e8d12/attachment-0003.ods>


More information about the Labs-admin mailing list