Στις 17-12-2010, ημέρα Παρ, και ώρα 00:52 +0100, ο/η Platonides έγραψε:
Roan Kattouw wrote:
I'm not sure how hard this would be to achieve (you'd have to correlate blob parts with revisions manually using the text table; there might be gaps for deleted revs because ES is append-only) or how much it would help (my impression is ES is one of the slower parts of our system and reducing the number of ES hits by a factor 50 should help, but I may be wrong), maybe someone with more relevant knowledge and experience can comment on that (Tim?).
Roan Kattouw (Catrope)
ExternalStoreDB::fetchBlob() is already keeping the last one to optimize repeated accesses to the same blob (we would probably want a bigger cache for the dumper, though). On the other hand, I don't think the dumpers should be doing the store of textid contents in memcached (Revision::loadText) since they are filling them with entries useless for the users queries (having a different locality set), useless for themselves (since they are traversing the full list once) and -even assuming that the memcached can happily handle it and no other data is affecting by it- the network delay make it a non-free operation.
Ariel, do you have in wikitech the step-by-step list of actions to setup a WMF dump server? I always forget about which scripts are being used and what does each of them do. Can xmldumps-phase3 be removed? I'd prefer that it uses the release/trunk/wmf-deployment, an old copy is a source for problems. If additional changes are needed (it seems unpatched), the appropiate hooks should be added in core.
Most backups run off of trunk. The stuff I have in my branch is the parallel stuff for testing.
http://wikitech.wikimedia.org/view/Dumps details the various scripts.
No, xmldumps-phase3 can't be removed yet. I have yet to make the changes I need to that code (and I won't make them in core immediately, they need to be tested thoroughly first before being checked in). Once I think they are ok, then I will fold them into trunk. It will be a while yet.
Ariel