Toolserver-l July 2006

toolserver-l@lists.wikimedia.org

20 participants
13 discussions

fwd: wikicnt_daemon.pl
by Daniel Kinzler 08 Jul '06

08 Jul '06

Edward Chernenko replied to my question about the wikicnt_daemon.pl script. He CCed to this list, but apperently, that did not go through. Maybe he's not registered here? He should be... Anyway, below is his response, fyi: -------- Original Message -------- Subject: Re: wikicnt_daemon.pl Date: Fri, 7 Jul 2006 12:02:49 +0400 From: Edward Chernenko <edwardspec(a)gmail.com> To: Daniel Kinzler <daniel(a)brightbyte.de> CC: toolserver-l(a)wikipedia.org References: <44AB8C72.90102(a)brightbyte.de> 2006/7/5, Daniel Kinzler <daniel(a)brightbyte.de>: > Hi > > When monitoring activity on the toolserver, I often notice your script > wikicnt_daemon.pl - it seems to be started every few hours, run for > quite a while, and there are often many instances running at once (33 at > the moment). I suspect (but I'm not sure) that it may be one of the > reasons the toolserver often falls behind with replicating from the > master db. Critical resources are RAM and Disk-I/O, and thus SQL > queries, of course. > Please tell me what that script does, and why there are so many > instances at once. Please send a copy of your response to > <toolserver-l(a)Wikipedia.org>. Thanks! > Regards, > Daniel aka Duesentrieb Hi Daniel, This script is articles counter installed by admins of Russian Wikipedia. It must make 5-100 inserts into database per second. Now I'm going to move this from MySQL to GDBM database (this should reduce load) but this is not yet done. Currently running this as daemon is quite a good optimization because of persistent connections to MySQL and usage of prepared statements. Unfortunately, there's no threads support in Perl version installed and one thread can't dispatch all 5-100 requests per second because it's waiting too many time for MySQL server reply. So this script simply forks five times (so there're 32, not 33 threads) after making listening socket and before connecting to database. Another optimization was caching results of 'page_title -> page_id' requests in memory (cron task restarted daemon each hour to clear it). Here I made a mistake: full cache (with info about all pages) takes 14 Mb. But after you report I realized that it can take 14*32 = 448 Mb. Now I moved this into GDBM database which is updated only one time per day. Please check, RAM usage should be now not more than 1-2 Mb. I'm now working on future optimizations. But the counter should be on before this is done (it collects no info about time of requests, only hits from the moment it was launched first). P.S. There're no required libraries installed (gdbm, sqlite) and perl modules (GDBM_FILE and DBD::SQLite). -- Edward Chernenko <edwardspec(a)gmail.com> -- Homepage: http://brightbyte.de

7 10

wikicnt_daemon.pl
by Daniel Kinzler 07 Jul '06

07 Jul '06

Hi When monitoring activity on the toolserver, I often notice your script wikicnt_daemon.pl - it seems to be started every few hours, run for quite a while, and there are often many instances running at once (33 at the moment). I suspect (but I'm not sure) that it may be one of the reasons the toolserver often falls behind with replicating from the master db. Critical resources are RAM and Disk-I/O, and thus SQL queries, of course. Please tell me what that script does, and why there are so many instances at once. Please send a copy of your response to <toolserver-l(a)Wikipedia.org>. Thanks! Regards, Daniel aka Duesentrieb -- Homepage: http://brightbyte.de

2 1

project lists
by interiot＠68k.org 03 Jul '06

03 Jul '06

For what it's worth, I'm interested in having alternative project-list tools that more or less cooperate. But I really think the project data should be editable on-wiki, for various reasons (one of the larger reasons is that tags should be flexible and easy to change, another is that if some authors don't post their information, and I'd like to add it myself from the Apache logs, if only to get the ball rolling). To try to make the on-wiki data easier to access, I've made TSTOC able to dump its data. Running this will print to STDOUT the on-wiki data in a format like Leon's ~/.projects/. /home/interiot/public_html/cgi-bin/tstoc --dump [--purge] I would be happy to tweak anything regarding data dumping to accomodate any use of the dump. Again, the data comes from http://meta.wikimedia.org/wiki/Toolserver/TStoc, and it's recently been changed to a format that's hopefully more accessible and easy to use (thanks for the suggestion, Nichtich). -Dave (--purge purges the wikitext-cache, and forces it to fetch a fresh copy)

3 2

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Toolserver-l July 2006