I've doublechecked that the duplicate pages are in fact in separate stub jobs and updated the phabricator task accordingly.  As this is something that would be a lot of work to address with the current architecture and is on the drawing board for the Dumps 2.0 rewrite, I'm deferring the issue til then.  In the meantime scripts or processors that work with multi-stub dumps should be prepared to filter out such duplicates, though they would be rare.


On Mon, Apr 11, 2016 at 8:05 PM, Ariel Glenn WMF <ariel@wikimedia.org> wrote:
I've been trying to get the new hardware out for the monthly run.  I'll be looking at this today and tomorrow to verify that the issue is really with separate page ranges being dumped for the same wiki without having the database frozen across the entire time of the run.  If that's indeed the case, it's not fixable until we revisit the db backend, potentially a big job.


On Fri, Apr 1, 2016 at 1:27 PM, Sebastiano Vigna <vigna@di.unimi.it> wrote:

> On 23 Feb 2016, at 14:43, Ariel Glenn WMF <ariel@wikimedia.org> wrote:
> I will investigate this.  Tracked at https://phabricator.wikimedia.org/T127832

I've seen no progress recently on the issue. Should we assume there will be duplicates?