Re: [Wikitech-l] Commons tech wishlist

10 Aug 2007

On 8/10/07, Minute Electron &lt;minuteelectron(a)googlemail.com&gt; wrote:
...
  A tool has been created called Wikix, it downloads all
images from a wiki. I 
I'm not aware of any method to check the validity of non-deleted*
files after downloading them via HTTP, beyond checking the size and
hoping the file isn't corrupted or downloading them more than once.

When you're talking about very nearly 1TB of images fetched via 1.75
million HTTP requests (current commons size) corruption is a real
issue if you care about getting a good copy. Errors that leave size
intact are quite possible, and fetching every file twice isn't really
a sane option for that much data.

I'm not aware of any efforts to download commons via HTTP. Previously
Jeff Merkey downloaded those that English Wikipedia uses, but thats
only part of a much larger collection.

I don't believe that moving that much data isn't really a major issue
itself at least for the sort of people that have the storage around to
handle it, back when I downloaded the old commons image dump (about
300gb) that we had posted the transfer took 4 days, which I don't
consider a big deal at all.

*deleted files are renamed to the SHA1 of their content, so it's easy
to check their transfer validity. I wish non-deleted images behaved in
the same way.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Commons tech wishlist