Re: [Foundation-l] Delete of Article History and GFDL

16 Sep 2008


      On Tue, Sep 16, 2008 at 10:15 AM, Charlotte Webb <charlottethewebb@gmail.com
...
wrote:
...
Has anybody ever thought about doing split dumps instead?
Yes, this has been discussed to death by lots of people in various different
forums.  It's not really clear that it would be a significant enough benefit
to be worth the (significant) effort.
Having spent the last 48 hours or so importing one of the smaller dump files
(enwiki-20080312-page.sql.gz) into MySQL, I'd say the bigger benefit would
be derived by creating a set of dump files which are already indexed (could
be in addition to the dumps already made).  Preferably something which could
be accessed in-place while still bzipped (which is actually feasible, and
something I'm about halfway finished writing myself).  I spend way more time
uncompressing and/or importing and/or indexing the dumps than I do
downloading them, and I just don't have the terabytes of free disk space
needed to keep a full dump around uncompressed.
Once I have everything imported into MySQL, I can just download the new stub
dumps and download the new revisions one at a time.  As a bonus, I won't
have to worry about the history dump failing.
I guess I should just pony up a few hundred dollars for a terabyte hard
drive or two.  It should be easy to store the text in 900K bzip chunks
(which I can then index), but only if I have the drive space to expand
everything first and then recompress it.  Anyone want to lend me a couple
terabyte hard drives for a month in exchange for a copy of anything I manage
to produce?

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [Foundation-l] Delete of Article History and GFDL