On 10-11-16 08:51 AM, Roan Kattouw wrote:
2010/11/15 Daniel Friesenlists@nadir-seen-fire.com:
There was a thought about the job queue that popped into my mind today.
From what I understand, for a Wiki Farm, in order to use runJobs.php instead of using the in-request queue (which on high traffic sites is less desireable) the Wiki Farm has to run runJobs.php periodically for each and every wiki on the farm. So, for example. If a Wiki Farm has 10,000 wiki it's hosting, say the Wiki Host really wants to ensure that the queue is run at least hourly to keep the data on the wiki reasonably up to date, the wiki farm essentially needs to call runJobs.php 10,000 times an hour (ie: one time for each individual wiki), irrelevantly of whether a wiki has jobs or not. Either that or poll each database before hand, which in itself is 10,000 database calls an hour plus the runJobs execution which still isn't that desireable.
Have you considered the fact that the WMF cluster is in this exact situation? ;)
However, we don't call runJobs.php for all wikis periodically. Instead, we call nextJobDB.php which generates a list of wikis that have pending jobs (by connecting to all of their DBs), caches it in memcached (caching was broken until a few minutes ago, oops) and outputs a random DB name. We then run runJobs.php on that random DB name. This whole thing is in maintenance/jobs-loop.sh
Roan Kattouw (Catrope)
Ok, then... How many databases are in the cluster being served by nextJobDB? How long does it take to connect to all the databases and figure out what ones have pending jobs?
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]