Re: [Wikitech-l] Moving the Dump Process to another language

25 Mar 2011

Στις 25-03-2011, ημέρα Παρ, και ώρα 21:49 +0100, ο/η Platonides έγραψε:
...
  Andrew Dunbar wrote:
  Just a thought, wouldn't it be easier to
generate dumps in parallel if
 we did away with the assumption that the dump would be in database
 order. The metadata in the dump provides the ordering info for the
 people that require it.

 Andrew Dunbar (hippietrail)  
 I don't see how doing the dumps in a different order allows you to
 greater parallelism.
 You can already launch several processes at different points of the set.
 Giving one every N articles to each process would allow more balanced
 pieces, but that's not important. You would also skip the work of
 reading the old dump to the offset, although that's reasonably fast.
 The important point for having them in this order is the property to
 keep the pages in the same order as the previous dump.

I'm pretty sure there are a lot of folks out there that, like me, have
tools which rely on exactly this property (new/changed stuff shows up at
the end).  

Amusingly, splitting based on some number of articles doesn't really
balance out the pieces, at least for history dumps, after the project
has been around long enough with enough activity.  Splitting by number
of revisions is what we really want, and the older pages have many many
more revs than later pages.

Ariel

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Moving the Dump Process to another language