[Wikimedia-l] Fire Drill Re: Wikimedia sites not easy to archive (Was Re: Knol is closing tomorrow )
John
phoenixoverride at gmail.com
Thu May 17 05:22:51 UTC 2012
Anthony the process is linear, you have a php inserting X number of rows
per Y time frame. Yes rebuilding the externallinks, links, and langlinks
tables will take some additional time and wont scale. However I have been
working with the toolserver since 2007 and Ive lost count of the number of
times that the TS has needed to re-import a cluster, (s1-s7) and even
enwiki can be done in a semi-reasonable timeframe. The WMF actually
compresses all text blobs not just old versions. complete download and
decompression of simple only took 20 minutes on my 2 year old consumer
grade laptop with a standard home cable internet connection, same download
on the toolserver (minus decompression) was 88s. Yeah Importing will take a
little longer but shouldnt be that big of a deal. There will also be some
need cleanup tasks. However the main issue, archiving and restoring wmf
wikis isnt an issue, and with moderately recent hardware is no big deal. Im
putting my money where my mouth is, and getting actual valid stats and
figures. Yes it may not be an exactly 1:1 ratio when scaling up, however
given the basics of how importing a dump functions it should remain close
to the same ratio
On Thu, May 17, 2012 at 12:54 AM, Anthony <wikimail at inbox.org> wrote:
> On Thu, May 17, 2012 at 12:45 AM, John <phoenixoverride at gmail.com> wrote:
> > Simple.wikipedia is nothing like en.wikipedia I care to dispute that
> > statement, All WMF wikis are setup basically the same (an odd extension
> here
> > or there is different, and different namespace names at times) but for
> the
> > purpose of recovery simplewiki_p is a very standard example. this issue
> isnt
> > just about enwiki_p but *all* wmf wikis. Doing a data recovery for
> enwiki vs
> > simplewiki is just a matter of time, for enwiki a 5 day estimate would be
> > fairly standard (depending on server setup) and lower times for smaller
> > databases. typically you can explain it in a rate of X revisions
> processed
> > per Y time unit, regardless of the project. and that rate should be
> similar
> > for everything given the same hardware setup.
>
> Are you compressing old revisions, or not? Does the WMF database
> compress old revisions, or not?
>
> In any case, I'm sorry, a 20 gig mysql database does not scale
> linearly to a 20 terabyte mysql database.
>
More information about the Wikimedia-l
mailing list