Re: [Wikitech-l] error lasted more than 10 minutes....

29 Nov 2011


      On Mon, Nov 28, 2011 at 12:06 PM, Roan Kattouw roan.kattouw@gmail.comwrote:
...
On Mon, Nov 28, 2011 at 8:59 PM, Neil Harris neil@tonal.clara.co.uk
wrote:
...
I hadn't thought properly about cache stampedes: since the parser cache
is
...
only part of page rendering, this might also explain some of the other
occasional slowdowns I've seen on Wikipedia.
It would be really cool if there could be some sort of general mechanism
to
...
enable this to be prevented this for all page URLs protected by
memcaching,
...
throughout the system.
I'm not very familiar with PoolCounter but I suspect it's a fairly
generic system for handling this sort of thing. However, stampedes
have never been a practical problem for anything except massive
traffic combined with slow recaching, and that's a fairly rare case.
So I don't think we want to add that sort of concurrency protection
everywhere.
For memcache objects that can be grouped together into an "ok to use if a
bit stale" bucket (such as all kinds of stats), there is also the
possibility of lazy async regeneration.
Data is stored in memcache with a fuzzy expire time, i..e { data:foo,
stale:$now+15min } and a cache ttl of forever.  When getting the key, if
the time stamp inside marks the data as stale, you can 1) attempt to obtain
a exclusive (acq4me) lock from poolcounter. If immediately successful,
launch an async job to regenerate the cache (while holding the lock) but
continue the request with stale data.  In all other cases, just use the
stale data.  Mainly useful if the regeneration work is hideously expensive,
such that you wouldn't want clients blocking on even a single cache regen
(as is the behavior with poolcounter as deployed for the parser cache.)

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] error lasted more than 10 minutes....