Re: [Wikitech-l] error lasted more than 10 minutes....

28 Nov 2011

On Mon, Nov 28, 2011 at 12:06 PM, Roan Kattouw &lt;roan.kattouw(a)gmail.com&gt;wrote;wrote:

...
  On Mon, Nov 28, 2011 at 8:59 PM, Neil Harris
&lt;neil(a)tonal.clara.co.uk&gt;
 wrote:
  I hadn't thought properly about cache
stampedes: since the parser cache  is
  only part of page rendering, this might also
explain some of the other
 occasional slowdowns I've seen on Wikipedia.

 It would be really cool if there could be some sort of general mechanism  to
  enable this to be prevented this for all page
URLs protected by  memcaching,
  throughout the system.
  I'm not very familiar with PoolCounter but I suspect it's a fairly
 generic system for handling this sort of thing. However, stampedes
 have never been a practical problem for anything except massive
 traffic combined with slow recaching, and that's a fairly rare case.
 So I don't think we want to add that sort of concurrency protection
 everywhere.

For memcache objects that can be grouped together into an "ok to use if a
bit stale" bucket (such as all kinds of stats), there is also the
possibility of lazy async regeneration.

Data is stored in memcache with a fuzzy expire time, i..e { data:foo,
stale:$now+15min } and a cache ttl of forever.  When getting the key, if
the time stamp inside marks the data as stale, you can 1) attempt to obtain
a exclusive (acq4me) lock from poolcounter. If immediately successful,
launch an async job to regenerate the cache (while holding the lock) but
continue the request with stale data.  In all other cases, just use the
stale data.  Mainly useful if the regeneration work is hideously expensive,
such that you wouldn't want clients blocking on even a single cache regen
(as is the behavior with poolcounter as deployed for the parser cache.)

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] error lasted more than 10 minutes....