We currently use memcached to share cached objects across wikis, most importantly, entity objects (like data items). Ori suggested we should look into alternatives. This is what he wrote:
[21:15] <ori> I was wondering if you think the way you use memcached is optimal (this sounds like a loaded question but I mean it sincerely). And if not, I was going to propose that you identify an optimal distributed object store, and I was also going to offer to help push for procurement and deployment of such a service on the WMF cluster. [21:17] <ori> memcached is a bit of a black box. it is very difficult to get comprehensible metrics about how much space and bandwidth you're utilizing, especially when your data is mixed up with everything else that goes into memcached [21:18] <ori> and the fact that you're serializing objects using php serialize() rather than simple values makes it even harder, because it means that you can only really poke around from php with wikidata code available
Just food for thought, for now... any suggestions for a shared object store?
In any case, thanks for looking into this, Ori!
On Wed, Oct 1, 2014 at 2:46 PM, Daniel Kinzler daniel.kinzler@wikimedia.de wrote:
We currently use memcached to share cached objects across wikis, most importantly, entity objects (like data items). Ori suggested we should look into alternatives. This is what he wrote:
[21:15] <ori> I was wondering if you think the way you use memcached is optimal (this sounds like a loaded question but I mean it sincerely). And if not, I was going to propose that you identify an optimal distributed object store, and I was also going to offer to help push for procurement and deployment of such a service on the WMF cluster. [21:17] <ori> memcached is a bit of a black box. it is very difficult to get comprehensible metrics about how much space and bandwidth you're utilizing, especially when your data is mixed up with everything else that goes into memcached [21:18] <ori> and the fact that you're serializing objects using php serialize() rather than simple values makes it even harder, because it means that you can only really poke around from php with wikidata code available
The other major problem with memcache is that it doesn't support complex data structures like lists, queues, sets, or maps, so when you want to do things like, say, push or pop an item from a queue, you end up having to retrieve the entire collection, unserialize it, manipulate it locally, re-serialize it, and transmit it back in its entirety.
Just food for thought, for now... any suggestions for a shared object store?
I'm embarrassed to say that I don't know nearly enough about Wikidata to be able to make a recommendation. Where would you recommend I look if I wanted to understand the caching architecture?
Hey Ori!
Am 02.10.2014 06:45, schrieb Ori Livneh:
I'm embarrassed to say that I don't know nearly enough about Wikidata to be able to make a recommendation. Where would you recommend I look if I wanted to understand the caching architecture?
And I'm embarressed to say that we have very little high level documentation. There is no document on the overall caching architecture.
The use case in question is accessing data Items (and other Entities, like Properties) from client wikis like Wikipedia. Entities are accessed through an EntityRevisionLookup service; CachingEntityRevisionLookup is an implementation of EntityRevisionLookup that takes an "actual" EntityRevisionLookup (e.g. a WikiPageEntityRevisionLookup) and a BagOStuff, and implements a caching layer.
We use two CachingEntityRevisionLookup nested into each other: the outer-most uses a HashBagOStuff to implement in-process caching, the second level uses memcached. The objects that are cached there are instances of EntityRevision, which is a thin wrapper around an Entity (usually, an Item) plus a revision ID.
Please let me know if you have further questions!
-- daniel
PS: what do you think, where should this info go? Wikibase/docs/caching.md or some such?
Hey,
We use two CachingEntityRevisionLookup nested into each other: the
outer-most uses a HashBagOStuff to implement in-process caching, the second level uses memcached.
It is odd to have two different decorator instances for caching around th EntityRevisionLookup. I suggest to have only a single decorator for caching, which writes to a caching interface. Then this caching interface can have an implementation what uses multiple caches, and perhaps have a decorator on that level.
Cheers
-- Jeroen De Dauw - http://www.bn2vs.com Software craftsmanship advocate Evil software architect at Wikimedia Germany ~=[,,_,,]:3
On Thu, Oct 2, 2014 at 12:45 AM, Ori Livneh ori@wikimedia.org wrote:
On Wed, Oct 1, 2014 at 2:46 PM, Daniel Kinzler < daniel.kinzler@wikimedia.de> wrote:
We currently use memcached to share cached objects across wikis, most importantly, entity objects (like data items). Ori suggested we should look into alternatives. This is what he wrote:
[21:15] <ori> I was wondering if you think the way you use memcached is optimal (this sounds like a loaded question but I mean it sincerely). And if not, I was going to propose that you identify an optimal distributed object store, and I was also going to offer to help push for procurement and deployment of such a service on the WMF cluster. [21:17] <ori> memcached is a bit of a black box. it is very difficult to get comprehensible metrics about how much space and bandwidth you're utilizing, especially when your data is mixed up with everything else that goes into memcached [21:18] <ori> and the fact that you're serializing objects using php serialize() rather than simple values makes it even harder, because it means that you can only really poke around from php with wikidata code available
The other major problem with memcache is that it doesn't support complex data structures like lists, queues, sets, or maps, so when you want to do things like, say, push or pop an item from a queue, you end up having to retrieve the entire collection, unserialize it, manipulate it locally, re-serialize it, and transmit it back in its entirety.
Use more Redis maybe?
On 10/02/2014 09:22 AM, Nikolas Everett wrote:
On Thu, Oct 2, 2014 at 12:45 AM, Ori Livneh <ori@wikimedia.org mailto:ori@wikimedia.org> wrote:
On Wed, Oct 1, 2014 at 2:46 PM, Daniel Kinzler <daniel.kinzler@wikimedia.de <mailto:daniel.kinzler@wikimedia.de>> wrote: We currently use memcached to share cached objects across wikis, most importantly, entity objects (like data items). Ori suggested we should look into alternatives. This is what he wrote: [21:15] <ori> I was wondering if you think the way you use memcached is optimal (this sounds like a loaded question but I mean it sincerely). And if not, I was going to propose that you identify an optimal distributed object store, and I was also going to offer to help push for procurement and deployment of such a service on the WMF cluster. [21:17] <ori> memcached is a bit of a black box. it is very difficult to get comprehensible metrics about how much space and bandwidth you're utilizing, especially when your data is mixed up with everything else that goes into memcached [21:18] <ori> and the fact that you're serializing objects using php serialize() rather than simple values makes it even harder, because it means that you can only really poke around from php with wikidata code available The other major problem with memcache is that it doesn't support complex data structures like lists, queues, sets, or maps, so when you want to do things like, say, push or pop an item from a queue, you end up having to retrieve the entire collection, unserialize it, manipulate it locally, re-serialize it, and transmit it back in its entirety.
Use more Redis maybe?
Or use actual storage.
Am 02.10.2014 17:55, schrieb Jeroen De Dauw:
Hey,
We use two CachingEntityRevisionLookup nested into each other: the outer-most uses a HashBagOStuff to implement in-process caching, the second level uses memcached.
It is odd to have two different decorator instances for caching around th EntityRevisionLookup. I suggest to have only a single decorator for caching, which writes to a caching interface. Then this caching interface can have an implementation what uses multiple caches, and perhaps have a decorator on that level.
Went that way first. Didn't work out nicely. I forget why exactly.
I don't care much either way.
wikidata-tech@lists.wikimedia.org