Re: [Wikitech-l] Efficient caching of large data sets for wikidata

29 Nov 2014


      A few things to note:
* APC is not LRU, it just detects expired items on get() and clears
everything when full (https://groups.drupal.org/node/397938)
* APC has a low max keys config on production, so using key-per-item would
require that to change
* Implementing LRU groups for BagOStuff would require heavy CAS use and
would definitely be bad over the wire (and not great locally either)
Just how high is the label traffic/queries? Do we profile this?
If it is super high, I'd suggest the following as a possibility:
a) Install a tiny redis instance on each app server.
b) Have a sorted set in redis containing (label key => score) and individual
redis keys for label strings (with label keys). Label keys would be like
P33-en. The sorted set and string values would use a common key prefix in
redis. The sorted-set key would mention the max size.
c) Cache get() method would use the normal redis GET method. Once every 10
times it could send a Lua command to bump the label key's score in the
sorted-set (ZSCORE) to that of the highest score +1 (find via ZRANGE key -1
-1 WITHSCORES).
d) Cache set() method would be a no-op except once every 10 times. When it
does anything, it would send a Lua command to remove the lowest scored key
if there is no room (ZREMRANGEBYRANK key 0 1) and in any case add the label
key with a score equal to the highest score + 1. It would also add the value
in the separate key for that value with a TTL (likewise deleting it on
eviction). The sorted-set TTL would be set to max(current TTL, new value
TTL).
e) Cache misses would fetch from the DB rather than text store
If high traffic causes flooding, the "10" number can be tweaked (or
eliminated) or the "highest rank + 1" logic could be tweaked to insert new
labels with a score that's better than only 3/8 of the stuff rather than all
of it (borrowing from MySQL). The above method just uses O(logN) redis
stuff.
Such a thing could probably be useful for at least a few more use cases I'd
bet.
--
View this message in context: http://wikimedia.7.x6.nabble.com/Efficient-caching-of-large-data-sets-for-wi...
Sent from the Wikipedia Developers mailing list archive at Nabble.com.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Efficient caching of large data sets for wikidata