Anthony the process is linear, you have a php inserting X number of rows
per Y time frame. Yes rebuilding the externallinks, links, and langlinks
tables will take some additional time and wont scale. However I have been
working with the toolserver since 2007 and Ive lost count of the number of
times that the TS has needed to re-import a cluster, (s1-s7) and even
enwiki can be done in a semi-reasonable timeframe. The WMF actually
compresses all text blobs not just old versions. complete download and
decompression of simple only took 20 minutes on my 2 year old consumer
grade laptop with a standard home cable internet connection, same download
on the toolserver (minus decompression) was 88s. Yeah Importing will take a
little longer but shouldnt be that big of a deal. There will also be some
need cleanup tasks. However the main issue, archiving and restoring wmf
wikis isnt an issue, and with moderately recent hardware is no big deal. Im
putting my money where my mouth is, and getting actual valid stats and
figures. Yes it may not be an exactly 1:1 ratio when scaling up, however
given the basics of how importing a dump functions it should remain close
to the same ratio
On Thu, May 17, 2012 at 12:54 AM, Anthony <wikimail(a)inbox.org> wrote:
On Thu, May 17, 2012 at 12:45 AM, John
<phoenixoverride(a)gmail.com> wrote:
Simple.wikipedia is nothing like en.wikipedia I
care to dispute that
statement, All WMF wikis are setup basically the same (an odd extension
here
or there is different, and different namespace
names at times) but for
the
purpose of recovery simplewiki_p is a very
standard example. this issue
isnt
just about enwiki_p but *all* wmf wikis. Doing a
data recovery for
enwiki vs
simplewiki is just a matter of time, for enwiki a
5 day estimate would be
fairly standard (depending on server setup) and lower times for smaller
databases. typically you can explain it in a rate of X revisions
processed
per Y time unit, regardless of the project. and
that rate should be
similar
for everything given the same hardware setup.
Are you compressing old revisions, or not? Does the WMF database
compress old revisions, or not?
In any case, I'm sorry, a 20 gig mysql database does not scale
linearly to a 20 terabyte mysql database.