2010/11/15 Daniel
Friesen<lists(a)nadir-seen-fire.com>om>:
There was a thought about the job queue that
popped into my mind today.
From what I understand, for a Wiki Farm, in order to use runJobs.php
instead of using the in-request queue (which on high traffic sites is
less desireable) the Wiki Farm has to run runJobs.php periodically for
each and every wiki on the farm.
So, for example. If a Wiki Farm has 10,000 wiki it's hosting, say the
Wiki Host really wants to ensure that the queue is run at least hourly
to keep the data on the wiki reasonably up to date, the wiki farm
essentially needs to call runJobs.php 10,000 times an hour (ie: one time
for each individual wiki), irrelevantly of whether a wiki has jobs or
not. Either that or poll each database before hand, which in itself is
10,000 database calls an hour plus the runJobs execution which still
isn't that desireable.
Have you considered the fact that the WMF cluster is in this exact situation? ;)
However, we don't call runJobs.php for all wikis periodically.
Instead, we call nextJobDB.php which generates a list of wikis that
have pending jobs (by connecting to all of their DBs), caches it in
memcached (caching was broken until a few minutes ago, oops) and
outputs a random DB name. We then run runJobs.php on that random DB
name. This whole thing is in maintenance/jobs-loop.sh
Roan Kattouw (Catrope)
Ok, then...
How many databases are in the cluster being served by nextJobDB?
How long does it take to connect to all the databases and figure out
what ones have pending jobs?
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [