Hi,
I encountered a problem when trying to download the english wikipedia image dumps. First of all, the web page states that the size of the dump is 16.7 GB. However, when I tried to download using wget, only about 700 MB was downloaded. When I tried to extract the tar file, it stops abruptly, giving error message that the end of file is corrupted.
So, I wondered if the image dumps are actually corrupted or my method of downloading is actually wrong and certain steps are required to download the image dump. Please help! Thank you in advance for your kind assistance.
Regards, Darwin Sadeli
On Thu, Aug 04, 2005 at 03:43:25PM +0800, Darwin Sadeli wrote:
I encountered a problem when trying to download the english wikipedia image dumps. First of all, the web page states that the size of the dump is 16.7 GB. However, when I tried to download using wget, only about 700 MB was downloaded. When I tried to extract the tar file, it stops abruptly, giving error message that the end of file is corrupted.
So, I wondered if the image dumps are actually corrupted or my method of downloading is actually wrong and certain steps are required to download the image dump. Please help! Thank you in advance for your kind assistance.
Iirc stable wget has a bigfile problems. So it sees 16.7 GB (4 * 4GB + 700 MB) as 700 MB, because that's what fits in 32 bit value.
Try some other program (or maybe unstable wget).
On Thu, 4 Aug 2005, Tomasz Wegrzanowski wrote:
On Thu, Aug 04, 2005 at 03:43:25PM +0800, Darwin Sadeli wrote:
I encountered a problem when trying to download the english wikipedia image dumps. First of all, the web page states that the size of the dump is 16.7 GB. However, when I tried to download using wget, only about 700 MB was downloaded. When I tried to extract the tar file, it stops abruptly, giving error message that the end of file is corrupted.
So, I wondered if the image dumps are actually corrupted or my method of downloading is actually wrong and certain steps are required to download the image dump. Please help! Thank you in advance for your kind assistance.
Iirc stable wget has a bigfile problems. So it sees 16.7 GB (4 * 4GB + 700 MB) as 700 MB, because that's what fits in 32 bit value.
Try some other program (or maybe unstable wget).
Recent versions of curl have been tested and are able to download large files without problems. Note that curl needs an explicit option ("-C -" i believe) to continue from a partially downloaded file if the transfer was interrupted (can happen in a several hours download), otherwise it will start from scratch.
Alfio
wikitech-l@lists.wikimedia.org