Re: [Wikipedia-l] Re: Image tarball

16 Oct 2004


      On Fri, 15 Oct 2004, Andre Engels wrote:
...
On Wed, 13 Oct 2004 23:15:59 +0200, andyr@wizzy.com andyr@wizzy.com wrote:
...
On Wed, 13 Oct 2004, Tim Starling wrote:
...
I've added the image tarball generation to the backup script, so new
tarballs will be generated every week from now on.
14Gig - Owww.
In a few short months it has grown from 3Gig.
Sorry - the cron jobs had not run and I was looking at the full db archive.
Now I see the pictures - 8Gig - not quite so bad :-)
...
I assume the cause of that is the new image syntax: It used to be that
if you had a large image, you'd make it smaller (which also decreased
its file size). Now it is put on the site in large version, and then
made smaller to the user with the '000px' markup. Which means that
there are much more large (sometimes huge) image files.
It will take me a week or so to get a good look at these - but -
a question for the developers - am I right to only accept files
matching ./en/[0-9a-f]/../* from the archive ?
Presumably uploads are just hashed into these dirs ?
There are a few pics that come with the mediawiki software that I
would, naturally, leave alone.
In the first (Jun) archive /thumb/* was about 700Meg, and /archive/*
was similar. There were also a lot of encyclopedia pics in the
root dir - I threw them all away without noticing anything untoward.
I might run a script over the archive and convert large images
to ones of the same size but, say, 70% quality. I imagine I
could easily halve the archive size that way.
If there are other regexes that would catch files resized by the
server I would be very grateful for the hint.
Currently I am getting the archive to a US server, unpacking,
throwing away, and then rsyncing down to a friendly server in
South Africa.
Cheers,   Andy!

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

Re: [Wikipedia-l] Re: Image tarball