[Wikimedia-l] Fire Drill Re: Wikimedia sites not easy to archive (Was Re: Knol is closing tomorrow )

John phoenixoverride at gmail.com
Thu May 17 06:06:18 UTC 2012


On Thu, May 17, 2012 at 1:52 AM, Anthony <wikimail at inbox.org> wrote:

> On Thu, May 17, 2012 at 1:22 AM, John <phoenixoverride at gmail.com> wrote:
> > Anthony the process is linear, you have a php inserting X number of rows
> per
> > Y time frame.
>
> Amazing.  I need to switch all my databases to MySQL.  It can insert X
> rows per Y time frame, regardless of whether the database is 20
> gigabytes or 20 terabytes in size, regardless of whether the average
> row is 3K or 1.5K, regardless of whether I'm using a thumb drive or a
> RAID array or a cluster of servers, etc.
>

When refering to X over Y time, its an average of a of say 1000 revisions
per 1 minute, any X over Y period must be considered with averages in mind,
or getting a count wouldnt be possible.



> > Yes rebuilding the externallinks, links, and langlinks tables
> > will take some additional time and wont scale.
>
> And this is part of the process too, right?

That does not need to be completed prior to the site going live, it can be
done after making it public

> That part isnt
> > However I have been working
> > with the toolserver since 2007 and Ive lost count of the number of times
> > that the TS has needed to re-import a cluster, (s1-s7) and even enwiki
> can
> > be done in a semi-reasonable timeframe.
>
> Re-importing how?  From the compressed XML full history dumps?


> > The WMF actually compresses all text
> > blobs not just old versions.
>
> Is http://www.mediawiki.org/wiki/Manual:Text_table still accurate?  Is
> WMF using gzip or object?
>
> > complete download and decompression of simple
> > only took 20 minutes on my 2 year old consumer grade laptop with a
> standard
> > home cable internet connection, same download on the toolserver (minus
> > decompression) was 88s. Yeah Importing will take a little longer but
> > shouldnt be that big of a deal.
>
> For the full history English Wikipedia it *is* a big deal.
>
> If you think it isn't, stop playing with simple.wikipedia, and tell us
> how long it takes to get a mirror up and running of en.wikipedia.
>
> Do you plan to run compressOld.php?  Are you going to import
> everything in plain text first, and *then* start compressing?  Seems
> like an awful lot of wasted hard drive space.
>

If you setup your sever/hardware correctly it will compress the text
information during insertion into the database and compressOld.php is
actually designed only for cases where you start with an uncompressed
configuration


> > There will also be some need cleanup tasks.
> > However the main issue, archiving and restoring wmf wikis isnt an issue,
> and
> > with moderately recent hardware is no big deal. Im putting my money
> where my
> > mouth is, and getting actual valid stats and figures. Yes it may not be
> an
> > exactly 1:1 ratio when scaling up, however given the basics of how
> importing
> > a dump functions it should remain close to the same ratio
>
> If you want to put your money where your mouth is, import
> en.wikipedia.  It'll only take 5 days, right?
>

If I actually had a server or the disc space to do it I would, just to
prove your smartass comments as stupid as they actually are. However given
my current resource limitations (fairly crappy internet connection, older
laptops, and lack of HDD) I tried to select something that could give
reliable benchmarks. If your willing to foot the bill for the new hardware
Ill gladly prove my point


More information about the Wikimedia-l mailing list