New subject: Stuck db processes, uninterruptible sleep, wonky i/o

16 Oct 2003


      I managed to catch one of our little database-stuck spells (where a 
bunch of db threads get stuck in "statistics state") and got vmstat 
running on it.
The notable things are that the number of processes in uninterruptible 
sleep (the 'b' column) is bigger than usual (0-1 with an occasional 2 
is normal), a few processes are even swapped out ('w' column), and i/o 
has gone all wonky; some unusually low input ("bi") and very unusually 
high output ("bo").
There's nothing reported about SCSII oddities in the system logs, nor 
anything in the mysql log. It may be something as mundane as the 
flushing of journal buffers for the filesystem or the database, in 
which case some tweaking may be possible, or there may be darker forces 
at work (mwoo-ha-ha-haaaaa!) Apache seems to have died around this time 
as well, apparently a failed restart. Apache is set to restart on pliny 
every hour at 34 minutes to keep a memory leak from getting too bad. 
It's possible that the exiting Apache freed up a big hole in memory 
that other things suddenly tried to fill?  I don't know.
[brion@pliny brion]$ vmstat 5
    procs                      memory    swap          io     system     
     cpu
  r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us 
  sy  id
  0  5  3 799988   7380  39108 1227312  44  27    84   140   60    16  
23   5  72
  0  3  3 799988  10464  38600 1224112   6   1    84  8077  523   306  
16   4  80
  0  3  3 799988  10552  38400 1223784   2   4  1749  6151  751   524  
13   5  82
  0  3  3 799988  10556  37316 1224252   0  27   307  6257  672   363  
23   5  72
  0  2  4 799988   5588  36804 1229240   6  60  1127  7719  783   523  
14   3  83
  1  1  3 799988   8324  36196 1226936   6   5   402 19343  368   393  
14   6  81
  1  2  4 799976   6416  35664 1228536   0  91   277 15835  390   405  
35   7  58
  1  2  4 799976   8360  35340 1227040   0 105   344 10632  489   414  
39   8  52
  4  2  3 799976   8192  34656 1228296   0  14   288 12609  490   507  
40   8  52
  4  2  3 799984  10360  32700 1234596   0   2  1373 14228  586   743  
41   9  50
  1  3  4 799984   8204  30808 1246252   0  57   566 16876  357   546  
41   9  50
  1  3  4 799980   8484  29568 1244496   0   1    40 16054  241   399  
39   7  54
  2  9  4 799980   8360  28312 1244432   0   1    75 18043  262   422  
41   8  50
  0  2  5 797896  10092  28376 1243996   0   0    42 15818  377   298  
23   4  73
  0  2  3 797896   9572  28376 1244072   0   0    15 15003  795   146   
0   2  98
  0  2  3 797896   9232  28400 1244136   0   0    13 14039  716   123   
0   3  96
  0  2  3 797896   8732  28408 1244208   0   0    14 14344  642   154   
0   2  98
  0  2  3 797896   8324  28416 1244280   6   0    21 13928  543   129   
0   1  99
  0  3  2 797896   7868  28424 1244364   0   0    17 16170  487   135   
0   2  98
  0  3  2 796852   7272  28432 1244464  10   0    30 17208  441   138   
0   1  98
  0  1  0 786380  10376  28516 1246020 115   0   534 11009  553   641   
1   3  96
here things become more normalish.
procs                      memory    swap          io     system     
     cpu
  r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us 
  sy  id
  0  1  0 786380   8336  28608 1247252 346   0  5020   435 1145  1244   
7   2  91
  0  0  0 786380   8208  28664 1249480 290   0  1961   430  810   770   
2   1  97
  0  3  0 786380   8436  28760 1249292   0   0  2494   233  649   648   
3   2  95
  0  1  0 786380  12096  28820 1244136 275   0  1524    64  627   596   
4   2  95
  0  0  0 786380   8224  28900 1247824  18   0  1338   326  691   588   
2   2  95
  0  0  0 786380   8192  28968 1247780   0   0   427    28  520   389   
2   0  98
  0  0  0 786380   8304  29092 1247968   0   0   283   870  420   243   
1   1  98
  0  0  0 786380   8292  29156 1248000   0   0   211    26  268   117   
0   0 100
  0  0  0 786380  11208  29288 1245092   2   0   174   764  398   226   
0   1  98
  0  0  0 786380   9124  29368 1247100   0   0   402    38  422   220   
1   1  98
  0  2  1 786380   8176  29460 1248032   1   0   246   681  657   433   
1   1  98
  0  0  0 786380   8152  29556 1248024   0   0   357    99  377   197   
0   1  99
  0  0  0 786380   8416  29716 1247720   0   0   829    39  411   239   
1   0  99
  0  0  0 786380  11880  29824 1243912  54   0   570   193  518   355   
1   1  98
  0  0  0 786380  12288  29896 1243236  55   0   507    82  421   247   
1   0  98
-- brion vibber (brion @ pobox.com)