So here is the final breakdown of 'packet loss' and traffic volume per squid range

 

Squids are broken down by role (from CommonSettings.php) and location

Data based on 1:1000 sampled squid log files from Locke, 2010-04 till 2013-03

 

'packet loss': http://bit.ly/Y3Hqam (actually avg gap between sequence numbers)

hourly load:  http://bit.ly/Y3Iifm (x 1000)

 

Clearly there is no capacity issue at Locke, as some squid sets are serviced perfectly.

 

Some extra observations:

 

- All ssl squids have corrupted sequence numbering (see also Davids earlier mail)

 

- text squids have < 0.1% loss in recent months

 

- upload squids series knsq* (knams) have 2-3% loss in Feb/Mar,

  upload squids series cp* (eqiad) and amssq* (esams) are fine

  (upload squids in pmtpa have 23% loss since Nov 2012 but their load has dropped to almost zero so that doesn't weigh in on Ganglia trends)

 

- API squids pmpta: avg gap is well below 1000 since May 2012: (how can that be?)

 

Erik