Hello folks,
lately again we've had some stuff going on...
Though Brion was implementing that anon-blocking stuff (yay, more
blocking - faster performance!)
we were targeting other performance issues as well.
Tim did rewrite ip block code (did cut 50ms or so ;-) as well as made
lots of other nice stuff,
and now we implemented Mark's idea to run diskless squids (well, they
have disks, but no
cache on them).
Lots of our new servers have joined object cache running (hehe,
again) Tugela, instead of
memcached. It's interesting to see how it should grow. Sadly, no
expiration (memory->disk)
of objects happened yet in a week, so we can't measure anything.
BerkeleyDB standalone
might be a bit faster than memcached, though, benchmarks on same
hardware were not
conducted.
~22G of data is cached in object cache now - parser objects, image
metadata, diffs,
sessions, user objects, 'you have new messages' bits and language
objects. So far we didn't
notice any of glitches that forced us to remove Tugela from service
before (some cosmetic
patches were done). Anyway, we have more RAM, that didn't cost
millions, we use it.
Anyway, today with squids running from memory only we managed to
achieve 0.09s
average response times for logged in users, at least those who go
directy to Florida.
Before that Squid efficiency was really distorted by somewhat
blocking async i/o (if it
really existed there), poor sibling relations and memory leak.
We still have that memory leak and are somewhat lost with it.. Squid
'accounts' for 1G
memory, uses >2G, and it grows, until restarted. We need to solve
that, but nobody has
every really touched valgrind at such loads (eh, today squid servers
were serving like
>700 requests per second each), and I'm not sure if anyone touched
valgrind properly
at all ;-) We'll soon have a bunch of servers suitable for squid
task, but still, using them
more efficiently would help. We will always lack resources at some
place :-)
Guidelines could help, as well we could simply provide our sources, a
bit of configuration
and load documentation. *shrug*.
Another troubling part is sibling relations - right now each proxy
marks others as siblings
and proxy-only, that is shouldn't save contents into cache.
Eventually they do not talk
to each other at all and hit backend, and all have their separate
caches.
I'm not sure if that's related with equal object expiration times or
any other hypothesis.
If anyone has had experience with squids in such setups, where
there're lots of objects
and lots of servers and efficiency was managed, it would be sure nice
to hear it.
It is still strange that it blocks quite a bit at some housekeeping
operations on i/o.
BTW, it took a while today to detect a serious packet loss in our
upstream providers.
It does slightly affect client network performance, but quite stalls
communication
between our distributed clusters. Looking for such problems becomes a
bit of witch hunt :)
So much of today's experiences and joys ;-)
Cheers,
Domas