Hi all!
As you may have noticed, the propagation of changes from Wikidata to the Wikipedias has been slower than it should be. Because of this, changes on Wikidata have been showing up on Wikipedia watchlists very late, or not at all.
Katie and I have investigated the causes for this and what we can do about it. To keep you in the loop, here is what we found:
* A dispatcher needs about 3 seconds to dispatch 1000 changes to a client wiki. * Considering we have ~300 client wikis, this means one dispatcher can handle about 4000 changes per hour. * We currently have two dispatchers running in parallel (on a single box, hume), that makes a capacity of 8000 changes/hour. * We are seeing roughly 17000 changes per hour on wikidata.org - more than twice our dispatch capacity. * I want to try running 6 dispatcher processes; that would give us the capacity to handle 24000 changes per hour (assuming linear scaling).
Katie has prepared a patch for that: https://gerrit.wikimedia.org/r/#/c/55904/
Getting this patch in is currently the quickest way for us to make change propagation work. I hope running all the processes on the same box is not a problem, a second box for cron job will be set up "soon".
Future:
* Making the dispatcher a "real demon" would probably help with getting it deployed to more boxes.
* If the Job Queue gets support for delayed (and maybe also recurring) jobs, we could use the existing JQ infrastructure, and wouldn't need any processes for ourselves. I'm a bit unsure though how well we could control scaling in such a setup.
-- daniel
On Tue, Mar 26, 2013 at 10:26 AM, Daniel Kinzler < daniel.kinzler@wikimedia.de> wrote:
- If the Job Queue gets support for delayed (and maybe also recurring)
jobs, we could use the existing JQ infrastructure, and wouldn't need any processes for ourselves. I'm a bit unsure though how well we could control scaling in such a setup.
Support for delayed jobs was recently merged - https://gerrit.wikimedia.org/r/#/c/53315/
We're planning to the jobqueue backend from mysql to redis with the release of 1.21wmf13, and it will include delayed job support. It would be great if wikidata utilizes this for change propagation once it becomes available in a couple weeks.