Roan Kattouw wrote:
I'm not sure how hard this would be to achieve (you'd have to correlate blob parts with revisions manually using the text table; there might be gaps for deleted revs because ES is append-only) or how much it would help (my impression is ES is one of the slower parts of our system and reducing the number of ES hits by a factor 50 should help, but I may be wrong), maybe someone with more relevant knowledge and experience can comment on that (Tim?).
Roan Kattouw (Catrope)
ExternalStoreDB::fetchBlob() is already keeping the last one to optimize repeated accesses to the same blob (we would probably want a bigger cache for the dumper, though). On the other hand, I don't think the dumpers should be doing the store of textid contents in memcached (Revision::loadText) since they are filling them with entries useless for the users queries (having a different locality set), useless for themselves (since they are traversing the full list once) and -even assuming that the memcached can happily handle it and no other data is affecting by it- the network delay make it a non-free operation.
Ariel, do you have in wikitech the step-by-step list of actions to setup a WMF dump server? I always forget about which scripts are being used and what does each of them do. Can xmldumps-phase3 be removed? I'd prefer that it uses the release/trunk/wmf-deployment, an old copy is a source for problems. If additional changes are needed (it seems unpatched), the appropiate hooks should be added in core.