Re: [Wikitech-l] downloading wikipedia database dumps

8 Jan 2010

William Pietri wrote:
...
  On 01/07/2010 01:40 AM, Jamie Morken wrote:
  I have a
 suggestion for wikipedia!!  I think that the database dumps including
 the image files should be made available by a wikipedia bittorrent
 tracker so that people would be able to download the wikipedia backups
 including the images (which currently they can't do) and also so that
 wikipedia's bandwidth costs would be reduced. [...]

 Is the bandwidth used really a big problem? Bandwidth is pretty cheap 
 these days, and given Wikipedia's total draw, I suspect the occasional 
 dump download isn't much of a problem. 
No, bandwidth is not really the problem here. I think the core issue is 
to have bulk access to images.

There have been a number of these requests in the past and after talking 
  back and forth, it has usually been the case that a smaller subset of 
the data works just as well.

A good example of this was the Deutsche Fotokek archive made late last 
year.

http://download.wikipedia.org/images/Deutsche_Fotothek.tar ( 11GB )

This provided an easily retrievable high quality subset of our image 
data which researchers could use.

Now if we were to snapshot image data and store them for a particular 
project the amount of duplicate image data would become significant. 
That's because we re-use a ton of image data between projects and 
rightfully so.

If instead we package all of commons into a tarball then we get roughly 
6T's of image data which after numerous conversation has been a bit more 
then most people want to process.

So what does everyone think of going down the collections route?

If we provide enough different and up to date ones then we could easily 
give people a large but manageable amount of data to work with.

If there is a page already for this then please feel free to point me to 
it otherwise I'll create one.

--tomasz

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] downloading wikipedia database dumps