I've doublechecked that the duplicate pages are in fact in separate stub
jobs and updated the phabricator task accordingly. As this is something
that would be a lot of work to address with the current architecture and is
on the drawing board for the Dumps 2.0 rewrite, I'm deferring the issue til
then. In the meantime scripts or processors that work with multi-stub
dumps should be prepared to filter out such duplicates, though they would
be rare.
Ariel
On Mon, Apr 11, 2016 at 8:05 PM, Ariel Glenn WMF <ariel(a)wikimedia.org>
wrote:
I've been trying to get the new hardware out for
the monthly run. I'll be
looking at this today and tomorrow to verify that the issue is really with
separate page ranges being dumped for the same wiki without having the
database frozen across the entire time of the run. If that's indeed the
case, it's not fixable until we revisit the db backend, potentially a big
job.
Ariel
On Fri, Apr 1, 2016 at 1:27 PM, Sebastiano Vigna <vigna(a)di.unimi.it>
wrote:
On 23 Feb 2016, at 14:43, Ariel Glenn WMF
<ariel(a)wikimedia.org> wrote:
I will investigate this. Tracked at
https://phabricator.wikimedia.org/T127832
I've seen no progress recently on the issue. Should we assume there will
be duplicates?
Ciao,
seba