Browne is mysteriously down for the moment. An initial reboot got it running again; Tim has some syslog bits he could probably post here about some sort of problem. Apparently it went down again shortly thereafter, and my attempt to power cycle it didn't get it back, at least not back on the network.
Coronelli is currently serving all wikis. I also found that coro's squid was using an awful lot of memory - resident size ~2.1gb, though it's theoretically set to a 1350mb memory cache usage, and about 890mb used swap and growing fast. Load was 18-20ish. I restarted the squid to clear out the memory, and now the swap's gone and load's down to 6-8.
It remains to be seen whether it will eventually start eating into swap again.
The ganglia graph for coro shows slow, steady increase of swap usage and decrease of space used for cache and buffers over the last week, then vastly accelerating swap usage around the time we swapped in browne's stuff. I don't know if this indicates a memory leak somewhere or if it's supposed to be doing that.
-- brion vibber (brion @ pobox.com)
Brion Vibber wrote:
Browne is mysteriously down for the moment. An initial reboot got it running again; Tim has some syslog bits he could probably post here about some sort of problem. Apparently it went down again shortly thereafter, and my attempt to power cycle it didn't get it back, at least not back on the network.
I don't have the whole thing, just the few lines I posted to IRC at the time.
First, the log showed squid operating normally. Then it died, with this sort of thing being written to the log:
Apr 26 01:43:18 browne kernel: Slab corruption: start=295d1894, len=504 Apr 26 01:43:18 browne kernel: Redzone: 0x5a2cf071/0x5a2cf071. Apr 26 01:43:18 browne kernel: Last user: [<02185132>](destroy_inode+0x36/0x45) Apr 26 01:43:18 browne kernel: 030: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b cc 1c 5d 29 Apr 26 01:43:18 browne kernel: Prev obj: start=295d1690, len=504
Then shortly afterwards, the squid automatically came back on with a different PID. Kernel messages such as the following were displayed:
Apr 26 01:44:28 browne kernel: slab: Internal list corruption detected in cache 'dentry_cache'(14), slabp 64c7f000(12). Hexdump: ... Apr 26 01:44:28 browne kernel: invalid operand: 0000 [#1]
The server was contactable for some time after the squid restart, maybe 10 minutes. Then it stopped responding to ganglia, ssh or HTTP requests. It was, however, still pingable. The system log during this time showed crond causing a kernel error like the one above, once every minute. There was no other visible activity. This situation continued until the machine was power-cycled.
I don't know enough about the kernel to speculate on what went wrong on the basis of these logs.
After the restart, browne came back on and worked properly for about an hour, before dying again as Brion described. It is currently not responding to ping.
-- Tim Starling
On Sun, Apr 25, 2004 at 11:04:09PM -0700, Brion Vibber wrote:
Coronelli is currently serving all wikis. I also found that coro's squid was using an awful lot of memory - resident size ~2.1gb, though it's theoretically set to a 1350mb memory cache usage, and about 890mb used swap and growing fast. Load was 18-20ish. I restarted the squid to clear out the memory, and now the swap's gone and load's down to 6-8.
It remains to be seen whether it will eventually start eating into swap again.
Is the squid installation on curly still there? Gabriel set it up so that Jimbo could swap the memory in coronelli. If it's still around, this could be used perhaps?
Load of Coronelli is in the 50s for the last 30 minutes.
Regards,
JeLuF
Jens Frank wrote:
On Sun, Apr 25, 2004 at 11:04:09PM -0700, Brion Vibber wrote:
Coronelli is currently serving all wikis. I also found that coro's squid was using an awful lot of memory - resident size ~2.1gb, though it's theoretically set to a 1350mb memory cache usage, and about 890mb used swap and growing fast. Load was 18-20ish. I restarted the squid to clear out the memory, and now the swap's gone and load's down to 6-8.
It remains to be seen whether it will eventually start eating into swap again.
Is the squid installation on curly still there? Gabriel set it up so that Jimbo could swap the memory in coronelli. If it's still around, this could be used perhaps?
Load of Coronelli is in the 50s for the last 30 minutes.
Coronelli is currently acting as a database slave server. This is keeping its IDE hard drive busy. In theory we could stop the synchronisation and have it take up squid service (which is also reasonably hard-drive intensive), then restart the synchronisation at a later date. The other alternative would be to have one of the web servers run squid, i.e. 2:4 squids to apaches instead of 1:5 .
-- Tim Starling
On Apr 26, 2004, at 05:21, Tim Starling wrote:
Jens Frank wrote:
Is the squid installation on curly still there? Gabriel set it up so that Jimbo could swap the memory in coronelli. If it's still around, this could be used perhaps?
Load of Coronelli is in the 50s for the last 30 minutes.
Coronelli is currently acting as a database slave server.
s/Coronelli/curly/
*Curly* is running a db slave.
-- brion vibber (brion @ pobox.com)
Brion Vibber wrote:
On Apr 26, 2004, at 05:21, Tim Starling wrote:
Jens Frank wrote:
Is the squid installation on curly still there? Gabriel set it up so that Jimbo could swap the memory in coronelli. If it's still around, this could be used perhaps?
Load of Coronelli is in the 50s for the last 30 minutes.
Coronelli is currently acting as a database slave server.
s/Coronelli/curly/
*Curly* is running a db slave.
Oh, you all knew what I meant, right? Just a slip of the tongue. :)
-- Tim Starling
wikitech-l@lists.wikimedia.org