So here is the final breakdown of 'packet loss' and traffic volume per squid range
Squids are broken down by role (from CommonSettings.php) and location
Data based on 1:1000 sampled squid log files from Locke, 2010-04 till 2013-03
'packet loss': http://bit.ly/Y3Hqam (actually avg gap between sequence numbers)
hourly load: http://bit.ly/Y3Iifm (x 1000)
Clearly there is no capacity issue at Locke, as some squid sets are serviced perfectly.
Some extra observations:
- All ssl squids have corrupted sequence numbering (see also Davids earlier mail)
- text squids have < 0.1% loss in recent months
- upload squids series knsq* (knams) have 2-3% loss in Feb/Mar,
upload squids series cp* (eqiad) and amssq* (esams) are fine
(upload squids in pmtpa have 23% loss since Nov 2012 but their load has dropped to almost zero so that doesn't weigh in on Ganglia trends)
- API squids pmpta: avg gap is well below 1000 since May 2012: (how can that be?)
Erik