On 07/07/06, Daniel Kinzler daniel@brightbyte.de wrote:
Edward Chernenko replied to my question about the wikicnt_daemon.pl script. He CCed to this list, but apperently, that did not go through. Maybe he's not registered here? He should be...
He is, but according to the subscriber list, he's on digest mode.
Anyway, below is his response, fyi:
-------- Original Message -------- Subject: Re: wikicnt_daemon.pl Date: Fri, 7 Jul 2006 12:02:49 +0400 From: Edward Chernenko edwardspec@gmail.com To: Daniel Kinzler daniel@brightbyte.de CC: toolserver-l@wikipedia.org References: 44AB8C72.90102@brightbyte.de
2006/7/5, Daniel Kinzler daniel@brightbyte.de:
Hi
When monitoring activity on the toolserver, I often notice your script wikicnt_daemon.pl - it seems to be started every few hours, run for quite a while, and there are often many instances running at once (33 at the moment). I suspect (but I'm not sure) that it may be one of the reasons the toolserver often falls behind with replicating from the master db. Critical resources are RAM and Disk-I/O, and thus SQL queries, of course.
Please tell me what that script does, and why there are so many instances at once. Please send a copy of your response to toolserver-l@Wikipedia.org. Thanks!
Regards, Daniel aka Duesentrieb
Hi Daniel,
This script is articles counter installed by admins of Russian Wikipedia. It must make 5-100 inserts into database per second. Now I'm going to move this from MySQL to GDBM database (this should reduce load) but this is not yet done.
I'm not too chuffed at the precedent this is going to set, to be honest. Each Russian Wikipedia page view results in a hit to Zedler...it's still going to be using RAM and disk, etc...if Wikimedia want this sort of stuff, they should bloody well set it up themselves.
Currently running this as daemon is quite a good optimization because of persistent connections to MySQL and usage of prepared statements.
Unfortunately, there's no threads support in Perl version installed and one thread can't dispatch all 5-100 requests per second because it's waiting too many time for MySQL server reply. So this script simply forks five times (so there're 32, not 33 threads) after making listening socket and before connecting to database.
Yes, the listening socket. A foreign port that you didn't have authorisation to use, so far as I know - it would almost certainly be documented internally if you had. And didn't Duesentrieb notice something weird about the setup?
Another optimization was caching results of 'page_title -> page_id' requests in memory (cron task restarted daemon each hour to clear it). Here I made a mistake: full cache (with info about all pages) takes 14 Mb. But after you report I realized that it can take 14*32 = 448 Mb. Now I moved this into GDBM database which is updated only one time per day. Please check, RAM usage should be now not more than 1-2 Mb.
*You* need to take responsibility for checking that *your* tools aren't causing *our* server to die. You've got to be reasonable and make sure you glance at it all periodically.
Rob Church