Hello all,
Background and longer/more detailed discussion on this issue is in bug 44570: https://bugzilla.wikimedia.org/show_bug.cgi?id=44570
Summary: As we delete old -wmfX branches there appears to be cached pages that reference old branch URLs, eg: https://bits.wikimedia.org/static-1.21wmf1/skins/common/images/poweredby_med...
(that 404s because 1.21wmf1 is long gone)
If you want to see my bad ASCII art representation of our current caching layers, see this page: http://wikitech.wikimedia.org/view/Caching_overview
So possible ways forward ========================
option 1: * reduce parsercache timeout to size of deployment window (~28 days) [0] * Tim may have knowledge why that shouldn't happen [1]
option 2: * change away from version numbers in URLs [2] ** maybe use slots or something else ** skins?
option 3: * status quo
What do you think?
Greg
[0] https://bugzilla.wikimedia.org/show_bug.cgi?id=44570#c14 [1] https://bugzilla.wikimedia.org/show_bug.cgi?id=44570#c12 [2] https://bugzilla.wikimedia.org/show_bug.cgi?id=44570#c15
On Feb 22, 2013, at 6:33 PM, Greg Grossmeier greg@wikimedia.org wrote:
Hello all,
Background and longer/more detailed discussion on this issue is in bug 44570: https://bugzilla.wikimedia.org/show_bug.cgi?id=44570
Summary: As w
e delete old -wmfX branches there appears to be cached pages that reference old branch URLs, eg: https://bits.wikimedia.org/static-1.21wmf1/skins/common/images/poweredby_med...
(that 404s because 1.21wmf1 is long gone)
If you want to see my bad ASCII art representation of our current caching layers, see this page: http://wikitech.wikimedia.org/view/Caching_overview
So possible ways forward
option 1:
- reduce parsercache timeout to size of deployment window (~28 days) [0]
- Tim may have knowledge why that shouldn't happen [1]
Well, the obvious thing to do and imho what we should do, like, *right now* is extend the lifetime of the old branch to the timeout of the cache.
Simply not deleting a directory is very, very easy.
As far as I'm concerned we can agree right now not to delete any old branch from the servers until further notice (until we've figured out the max time age, and then implement the guard in multiversion/deleteMediawiki and then remove then when possible).
option 2:
- change away from version numbers in URLs [2]
** maybe use slots or something else ** skins?
My bugzilla comment doesn't' suggest to change away from using these version numbers. It suggest to not use these urls directly in any code that makes it to the main html output.
[0] https://bugzilla.wikimedia.org/show_bug.cgi?id=44570#c14 [1] https://bugzilla.wikimedia.org/show_bug.cgi?id=44570#c12 [2] https://bugzilla.wikimedia.org/show_bug.cgi?id=44570#c15
<quote name="Krinkle" date="2013-02-22" time="22:29:00 +0100">
Well, the obvious thing to do and imho what we should do, like, *right now* is extend the lifetime of the old branch to the timeout of the cache.
Simply not deleting a directory is very, very easy.
That is definitely a good stop-gap solution until we figure out a fix to the underlying issue.
I haven't seen any objection to this suggestion either on the bug, on list, or in discussions in the office as a stop-gap type thing, so I'm comfortable with that for now.
My bugzilla comment doesn't' suggest to change away from using these version numbers. It suggest to not use these urls directly in any code that makes it to the main html output.
My apologies for misrepresenting it. That makes sense.
I updated the wikitech page I started with a clarification (do let me know if I got it wrong still!)
How can we prevent this issue from happening in the future without having old versions laying around?
Greg
On Fri, Feb 22, 2013 at 1:29 PM, Krinkle krinklemail@gmail.com wrote:
Well, the obvious thing to do and imho what we should do, like, *right now* is extend the lifetime of the old branch to the timeout of the cache.
Simply not deleting a directory is very, very easy.
As far as I'm concerned we can agree right now not to delete any old branch from the servers until further notice (until we've figured out the max time age, and then implement the guard in multiversion/deleteMediawiki and then remove then when possible).
We had enough problems in the past with disk partitions filling up that this isn't as simple as it seems. While the Apaches themselves should have plenty of room now (but not infinite), there may still be other machines that builds get deployed to that have had problems with small disks and/or suboptimal partitioning. Sam has generally nuked old versions in response to something filling up (though recently, it was just the datacenter migration where we decided not to copy over old versions)
We probably shouldn't pursue this strategy unless: a) we calculate a disk space budget based on the maximum number of versions to keep around, and b) we have a commitment from Ops to provide the budgeted disk space
Another short term solution may be to just ensure we've got symlinks in place to fake this up (e.g. pointing 1.21wmf1 to the 1.21wmf9 subdirectory).
Under the hood, it may make sense to start managing things using slots now (even still using scap rather than sartoris), which would make symlink management a lot simpler. For example, we could move the 1.21wmf9 directory to "slot0", and symlink 1.21wmf9 to that, and move 1.21wmf10 to "slot1". When it comes time to deploy 1.21wmf11, we would deploy that to slot0, and the 1.21wmf9 symlink wouldn't need to be updated.
Rob
Another option is just to only keep old versions of skins folder (1.4M)
Given that CSS and JS go through the RL, the few images we use and link from the html could probably be squeezed in a single static path, though.
wikitech-l@lists.wikimedia.org