Re: [Wikitech-l] Best practices for read/write vs read-only requests, and our multi-DC future

21 Apr 2016


      On Thu, Apr 21, 2016 at 4:59 PM, Erik Bernhardson <
ebernhardson@wikimedia.org> wrote:
...
On Apr 20, 2016 10:45 PM, "Brion Vibber" bvibber@wikimedia.org wrote:
...
Note that we could fire off a job queue background task to do the actual
removal... But is it also safe to do that on a read-only request?
https://www.mediawiki.org/wiki/Requests_for_comment/Master_%26_slave_datacen...
...
seems to indicate job queueing will be safe, but would like to confirm
that. :)
I think this is the preferred method. My understanding is that the jobs
will get shipped to the primary DC job queue.
*nod* looks like per spec that should work with few surprises.
...
...
Similarly in https://gerrit.wikimedia.org/r/#/c/284269/ we may wish to
trigger missing transcodes to run on demand, similarly. The actual re
encoding happens in a background job, but we have to fire it off, and we
have to record that we fired it off so we don't duplicate it...
[snip]
...
The job queue can do deduplication, although you would have to check if
that is active while the job is running and not only while queued. Might
help?
Part of the trick is we want to let the user know that the job has been
queued; and if the job errors out, we want the user to know that the job
errored out.
Currently this means we have to update a row in the 'transcode' table
(TimedMediaHandler-specific info about the transcoded derivative files)
when we fire off the job, then update its state again when the job actually
runs.
If that's split into two queues, one lightweight and one heavyweight, then
this might make sense:
* N web requests hit something using File:Foobar.webm, which has a missing
transcode
* they each try to queue up a job to the lightweight queue that says "start
queueing this to actually transcode!"
* when the job queue runner on the lightweight queue sees the first such
job, it records the status update to the database and queues up a
heavyweight job to run the actual transcoding. The N-1 remaining jobs duped
on the same title/params either get removed, or never got stored in the
first place; I forget how it works. :)
* ... time passes, during which further web requests don't yet see the
updated database table state, and keep queueing in the lightweight queue.
* lightweight queue runners see some of those jobs, but they have the
updated master database state and know they don't need to act.
* database replication of the updated state hits the remote DC
* ..time passes, during which further web requests see the updated database
table state and don't bother queueing the lightweight job
* eventually, the heavyweight job runs, completes, updates the states at
start and end.
* eventually, the database replicates the transcode state completion to the
remote DC.
* web requests start seeing the completed state, and their output includes
the updated transcode information.
It all feels a bit complex, and I wonder if we could build some common
classes to help with this transaction model. I'm pretty sure we can be
making more use of background jobs outside of TimedMediaHandler's slow
video format conversions. :D
-- brion

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Best practices for read/write vs read-only requests, and our multi-DC future