Thanks Tomasz for the great work. I think most 2 old backups with the latest are more than enough. Most people use the latest as you have mentioned.

For huge dump a month window is reasonable as well but if this is pushed to two weeks, that would be perfect but I wonder whether people can really benefit from the dump every two weeks because the download, decompress, and processing of some huge dumps needs a lot of time.

bilal


On Thu, May 14, 2009 at 9:02 PM, Tomasz Finc <tfinc@wikimedia.org> wrote:
Now that we are generating all but the biggest of wiki's reliably I'd
like to start the discussion of retention for older data base dumps.

If we can reliably stick to a two week window for each wiki's dump
iteration, how many dumps would back would it make sense to keep?

Most clients that I've talked to only need the latest and simply look at
the older ones in case the newest dump failed a step.

If there are other retention cases then I'd love to hear them and figure
what's feasible to do.

Operations wise I'd be thinking of keeping somewhere between 1-5 of the
previous dumps and then archiving copies of each dump at 6month windows
for permanent storage. Doing that for all of the current dumps is way
more space then we have currently available but that's also why were
working on funding for those storage servers.

Is that overkill or simply not enough? let know.

--tomasz