Re: [Wikitech-l] RFC: Incremental history dumps

20 Oct 2007

On 10/20/07, Platonides &lt;Platonides(a)gmail.com&gt; wrote:
...
  Gregory Maxwell wrote:
  Bleh. Someone pulling increments couldn't
build a point in time
 snapshot, they would need to always pull the full.  And we want people
 using point in time versions of the site not mangled mixes. 
 They'd use the stubs version. 
Okay, you didn't mention that.... but please no: I have had a hard
enough time explaining to people that the separate SQL dumps aren't
consistent with the history dumps.

I don't want to end up in a situation where the only way to get a sane
copy of the site is stitching together dozens of files on the
recipients side.... people will do it wrong, or just skip building a
point in time version at all.. and make a big mess.

I'd rather go back to having separate metadata and text dumps than end
up with people needing to combine an old full dump, N large
incremental files, and a new stub dump through a bunch of complex
manipulation in order to arrive at a consistent copy of the site.

If we wanted to do that on the back end.. fine.

...
   Also, I expect
that once 7zed the incremets will not be too much
 smaller than the full, especially if partitoned by revid.  I wasn't proposing a
file per revid, but a file per N revisions, where N
 is a number which fits our needs ;-) 
Partition by revid doesn't necessarily mean one rev per file... and
thats certainly not what I thought you were suggesting.

You will screw compression if you partition by revid (i.e. in groups
of revs, failing to keep all revs of a single article in one place).
If you don't want to take my word for it try it yourself.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] RFC: Incremental history dumps