(added engineering@lists.wikimedia.org to recipients list)
Minor content correction: mentions of "30 days" should have really
On Fri, Mar 7, 2014 at 5:27 PM, Bryan Davis <bd808@wikimedia.org> wrote:
> On Thursday every week a new WFM branch is cut to deploy the group0
> wikis (test* and wm.o). On the following Tuesday it is promoted to the
> group1 wikis (all-wikipedias). Finally on Thursday is it promoted to
> group2 (wikipedias) while the group0 wikis start using another new
> version. At the current release cadence (one new branch a week) after
> 2 weeks in production a branch is no longer used. There can be minor
> exceptions to this due to major difficulties with a branch and/or
> holiday conflicts, but for the sake of this discussion those
> differences can be mostly ignored.
>
> A branch can't be deleted from the server cluster immediately after it
> is removed from the last wiki however. For better or worse, each
> branch contains static assets from core (resources & skins) and
> extensions that are served by the apaches. These assets are served
> using versioned URLs such as
> https://bits.wikimedia.org/static-1.23wmf17/skins/common/images/poweredby_mediawiki_88x31.png.
> Varnish caches pages containing these URLs for anons for up to 30
> days. That means that a request for static content contained by the
> 1.23wmf17 branch could be needed to satisfly an apache request for up
> to 30 days after that branch is no longer being used to satisfy PHP
> backed requests. Assuming the weekly release cadence, this means that
> the static assets from a branch are needed on the cluster for at least
> 45 days (14 days of active branch use + 31 days of cached page use).
>
> At the moment we don't have a well documented procedure for cleaning
> up old branches on tin and servers that rsync with tin (directly and
> indirectly). It seems to be a process that Sam does occasionally. The
> last commits that cleaned up old branches were merged on 2014-02-15:
> https://gerrit.wikimedia.org/r/#/c/113640/,https://gerrit.wikimedia.org/r/#/c/113641/.
> These commits cleaned up some truly ancient branches.
>
> A slightly different by related problem is the amount of disk space
> consumed by the l10n cache files for unused MW versions. The combined
> json and CDB files for the current 1.23 branches consume ~1.7G per
> version. It looks like Sam has been pruning these at some point as
> well as the cache/l10n directory for version 1.23wmf12 and earlier are
> empty.
>
> I recommend that we add two new weekly cleanup steps:
>
> * When we deploy a new branch to group0 (Thursdays), all branches
> retired more than 5 weeks ago should be removed. This should really
> only include multiple branches the first time it's done to catch up.
> After that it will be an "add a branch, kill a branch" situation. With
> the current release cadence this will keep us at 7 checked out
> branches on tin, 2 versions in active use and 5 waiting for potential
> cache references to expire.
>
> * When we move group1 to the newest branch (Tuesdays), the cache/l10n
> directory of all non-active branches should be purged. By this point
> there is little chance that we will be reverting the wikipedias to the
> N-2 branch and thus the l10n cache is just taking up disk space and
> slowing down rsync comparisons.
>
> Are there any objections to adding these procedures to the MW deploy process?
been "31 days". Apparently i changed it in some places before I hit
send but I didn't get them all. The 31 day upper limit comes from the
$wgSquidMaxage setting in InitialiseSettings.php [0]
[0]: https://git.wikimedia.org/blob/operations%2Fmediawiki-config/87e36518db5644f15748fbfc36c4d1bf3b2f65e8/wmf-config%2FInitialiseSettings.php#L10276
Engineering mailing list
Bryan
--
Bryan Davis Wikimedia Foundation <bd808@wikimedia.org>
[[m:User:BDavis_(WMF)]] Sr Software Engineer Boise, ID USA
irc: bd808 v:415.839.6885 x6855
_______________________________________________
Engineering@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/engineering