[Wikipedia-l] Re: Image tarball

16 Oct 2004


      Andy Rabagliati wrote:
...
It will take me a week or so to get a good look at these - but -
a question for the developers - am I right to only accept files
matching ./en/[0-9a-f]/../* from the archive ?
Presumably uploads are just hashed into these dirs ?
Yes, that's correct. The directory name is derived from the MD5 hash of 
the filename.
...
There are a few pics that come with the mediawiki software that I
would, naturally, leave alone.
In the first (Jun) archive /thumb/* was about 700Meg, and /archive/*
was similar. There were also a lot of encyclopedia pics in the
root dir - I threw them all away without noticing anything untoward.
In the real root directory there's symlinks to images in the other 
directories, apparently left there to avoid breaking URLs used in an 
earlier version of the software. Obviously tar has converted them from 
symlinks to duplicates. You can delete them.
...
I might run a script over the archive and convert large images
to ones of the same size but, say, 70% quality. I imagine I
could easily halve the archive size that way.
Quite likely.
...
If there are other regexes that would catch files resized by the
server I would be very grateful for the hint.
The thumb directory contains all the images resized automatically, 
although the ./en/[0-9a-f] directories will contain some duplicate 
images resized by hand.
-- Tim Starling

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

[Wikipedia-l] Re: Image tarball