On Thu, May 17, 2012 at 1:52 AM, Anthony <wikimail(a)inbox.org> wrote:
On Thu, May 17, 2012 at 1:22 AM, John
<phoenixoverride(a)gmail.com> wrote:
Anthony the process is linear, you have a php
inserting X number of rows
per
Y time frame.
Amazing. I need to switch all my databases to MySQL. It can insert X
rows per Y time frame, regardless of whether the database is 20
gigabytes or 20 terabytes in size, regardless of whether the average
row is 3K or 1.5K, regardless of whether I'm using a thumb drive or a
RAID array or a cluster of servers, etc.
When refering to X over Y time, its an average of a of say 1000 revisions
per 1 minute, any X over Y period must be considered with averages in mind,
or getting a count wouldnt be possible.
Yes rebuilding
the externallinks, links, and langlinks tables
will take some additional time and wont scale.
And this is part of the process too, right?
That does not need to be completed prior to the site going live, it can be
done after making it public
That part isnt
However I have been working
with the toolserver since 2007 and Ive lost count of the number of times
that the TS has needed to re-import a cluster, (s1-s7) and even enwiki
can
be done in a semi-reasonable timeframe.
Re-importing how? From the compressed XML full history dumps?
The WMF
actually compresses all text
blobs not just old versions.
Is
http://www.mediawiki.org/wiki/Manual:Text_table still accurate? Is
WMF using gzip or object?
complete download and decompression of simple
only took 20 minutes on my 2 year old consumer grade laptop with a
standard
home cable internet connection, same download on
the toolserver (minus
decompression) was 88s. Yeah Importing will take a little longer but
shouldnt be that big of a deal.
For the full history English Wikipedia it *is* a big deal.
If you think it isn't, stop playing with simple.wikipedia, and tell us
how long it takes to get a mirror up and running of en.wikipedia.
Do you plan to run compressOld.php? Are you going to import
everything in plain text first, and *then* start compressing? Seems
like an awful lot of wasted hard drive space.
If you setup your sever/hardware correctly it will compress the text
information during insertion into the database and compressOld.php is
actually designed only for cases where you start with an uncompressed
configuration
There will
also be some need cleanup tasks.
However the main issue, archiving and restoring wmf wikis isnt an issue,
and
with moderately recent hardware is no big deal.
Im putting my money
where my
mouth is, and getting actual valid stats and
figures. Yes it may not be
an
exactly 1:1 ratio when scaling up, however given
the basics of how
importing
a dump functions it should remain close to the
same ratio
If you want to put your money where your mouth is, import
en.wikipedia. It'll only take 5 days, right?
If I actually had a server or the disc space to do it I would, just to
prove your smartass comments as stupid as they actually are. However given
my current resource limitations (fairly crappy internet connection, older
laptops, and lack of HDD) I tried to select something that could give
reliable benchmarks. If your willing to foot the bill for the new hardware
Ill gladly prove my point