Στις 17-12-2010, ημέρα Παρ, και ώρα 00:52 +0100, ο/η Platonides έγραψε:
Roan Kattouw wrote:
I'm not sure how hard this would be to
achieve (you'd have to
correlate blob parts with revisions manually using the text table;
there might be gaps for deleted revs because ES is append-only) or how
much it would help (my impression is ES is one of the slower parts of
our system and reducing the number of ES hits by a factor 50 should
help, but I may be wrong), maybe someone with more relevant knowledge
and experience can comment on that (Tim?).
Roan Kattouw (Catrope)
ExternalStoreDB::fetchBlob() is already keeping the last one to optimize
repeated accesses to the same blob (we would probably want a bigger
cache for the dumper, though).
On the other hand, I don't think the dumpers should be doing the store
of textid contents in memcached (Revision::loadText) since they are
filling them with entries useless for the users queries (having a
different locality set), useless for themselves (since they are
traversing the full list once) and -even assuming that the memcached can
happily handle it and no other data is affecting by it- the network
delay make it a non-free operation.
Ariel, do you have in wikitech the step-by-step list of actions to setup
a WMF dump server?
I always forget about which scripts are being used and what does each of
them do.
Can xmldumps-phase3 be removed? I'd prefer that it uses the
release/trunk/wmf-deployment, an old copy is a source for problems. If
additional changes are needed (it seems unpatched), the appropiate hooks
should be added in core.
Most backups run off of trunk. The stuff I have in my branch is the
parallel stuff for testing.
http://wikitech.wikimedia.org/view/Dumps details the various scripts.
No, xmldumps-phase3 can't be removed yet. I have yet to make the
changes I need to that code (and I won't make them in core immediately,
they need to be tested thoroughly first before being checked in). Once
I think they are ok, then I will fold them into trunk. It will be a
while yet.
Ariel