Now that we are generating all but the biggest of wiki's reliably I'd like to start the discussion of retention for older data base dumps.
If we can reliably stick to a two week window for each wiki's dump iteration, how many dumps would back would it make sense to keep?
Most clients that I've talked to only need the latest and simply look at the older ones in case the newest dump failed a step.
If there are other retention cases then I'd love to hear them and figure what's feasible to do.
Operations wise I'd be thinking of keeping somewhere between 1-5 of the previous dumps and then archiving copies of each dump at 6month windows for permanent storage. Doing that for all of the current dumps is way more space then we have currently available but that's also why were working on funding for those storage servers.
Is that overkill or simply not enough? let know.
--tomasz
Thanks Tomasz for the great work. I think most 2 old backups with the latest are more than enough. Most people use the latest as you have mentioned.
For huge dump a month window is reasonable as well but if this is pushed to two weeks, that would be perfect but I wonder whether people can really benefit from the dump every two weeks because the download, decompress, and processing of some huge dumps needs a lot of time.
bilal
On Thu, May 14, 2009 at 9:02 PM, Tomasz Finc tfinc@wikimedia.org wrote:
Now that we are generating all but the biggest of wiki's reliably I'd like to start the discussion of retention for older data base dumps.
If we can reliably stick to a two week window for each wiki's dump iteration, how many dumps would back would it make sense to keep?
Most clients that I've talked to only need the latest and simply look at the older ones in case the newest dump failed a step.
If there are other retention cases then I'd love to hear them and figure what's feasible to do.
Operations wise I'd be thinking of keeping somewhere between 1-5 of the previous dumps and then archiving copies of each dump at 6month windows for permanent storage. Doing that for all of the current dumps is way more space then we have currently available but that's also why were working on funding for those storage servers.
Is that overkill or simply not enough? let know.
--tomasz
xmldatadumps-l@lists.wikimedia.org