I've been studying performance today. I'm looking for the bottlenecks. The total volume of traffic on the site is not so high as to cause me to reasonably suspect that hardware is the problem. So I'm looking for things that people do on the site that tend to be expensive, so that we can eliminate features for awhile (if we must) or find ways to optimize/cache them.
1. I notice that many of the 'special' pages are quite popular. I suspect that these are our best bets.
2. I propose that we test the site without the "this page has been accessed N times" feature. Doesn't it seem excessive to have a database write for every single pageview, when we are trying to increase performance? A nearly equivalent feature could be provided with a daily update from the access logs.
--Jimbo
Jimmy Wales wrote:
- I propose that we test the site without the "this page has been
accessed N times" feature.
This is my best guess too. However, it is possible to get from guesses to knowledge by checking getmicrotime() before and after the call and log the elapsed time (to stderr?). Wrap each database transaction in this, and get some statistics on which calls are worst, rather than removing calls by guessing.
Not knowing more about this specific system, my general experience is that database locks can eat a lot of wallclock time without consuming any CPU time. During the delay, the memory allocated by each execution thread can eat so much memory that a machine starts to swap. I have no idea if this happens in Wikipedia, but you can monitor disk I/O by simple programs like vmstat (or iostat on some systems). If this is the case, adding RAM to avoid swapping is *not* a solution.
wikitech-l@lists.wikimedia.org