Re: [Wikitech-l] Dealing with Large Files when attempting a wikipedia database download.

14 Apr 2009

Hello,

...
  I could be wrong, but all it may take is one server
(for whatever
 reason) deciding that the download is problematic for the whole file
 download to fail. 
Our download servers support resume.

...
  5)	Is there some type of timeout command lying
somewhere which might
 instruct the wikipedia server to quit a particular attempt to download
 a large file if it is taking too long? 
No.

...
  	It also seems like a good idea to split large files
up using a file
 splitter (whichever one takes your fancy) as larger file downloads
 would seem to be problematic for most people who have access to
 networks with only a limited connection speed. 
Our download servers support range requests, which are used by proper  
download clients to resume the downloads.
Every modern HTTP client should support download resume and large  
files - people are not running fat16 anymore either (you know, that  
doesn't support >2GB either), why would network tools and delivery be  
as ancient?

...
  	It occurs to me that, given the randomness of this
problem, this
 response might also be correspondingly random.  Still, how long might
 it take to organise something in the way of a (perhaps unix script
 automated?) file splitting for the larger wikipedia database download
 files? 
There is no need - we're using standards released 10 years ago to do  
the work properly.

...
  already the case – but, from what I gather, once an
incomplete
 database dump is downloaded – it is pretty useless, unless someone can
 correct me). 

Use HTTP resume functionality:

wget --continue
curl --continue-at

BR,
-- 
Domas Mituzas -- http://dammit.lt/ -- [[user:midom]]

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Dealing with Large Files when attempting a wikipedia database download.