Roan Kattouw wrote:
I'm not sure how hard this would be to achieve
(you'd have to
correlate blob parts with revisions manually using the text table;
there might be gaps for deleted revs because ES is append-only) or how
much it would help (my impression is ES is one of the slower parts of
our system and reducing the number of ES hits by a factor 50 should
help, but I may be wrong), maybe someone with more relevant knowledge
and experience can comment on that (Tim?).
Roan Kattouw (Catrope)
ExternalStoreDB::fetchBlob() is already keeping the last one to optimize
repeated accesses to the same blob (we would probably want a bigger
cache for the dumper, though).
On the other hand, I don't think the dumpers should be doing the store
of textid contents in memcached (Revision::loadText) since they are
filling them with entries useless for the users queries (having a
different locality set), useless for themselves (since they are
traversing the full list once) and -even assuming that the memcached can
happily handle it and no other data is affecting by it- the network
delay make it a non-free operation.
Ariel, do you have in wikitech the step-by-step list of actions to setup
a WMF dump server?
I always forget about which scripts are being used and what does each of
them do.
Can xmldumps-phase3 be removed? I'd prefer that it uses the
release/trunk/wmf-deployment, an old copy is a source for problems. If
additional changes are needed (it seems unpatched), the appropiate hooks
should be added in core.