http://en.wikipedia.org/wiki/Wikipedia_database has some information
on how to deal with the large files
henna
On Fri, Apr 10, 2009 at 21:43, Daniel Kinzler <daniel(a)brightbyte.de> wrote:
David Gerard schrieb:
2009/4/10 Jameson Scanlon
<jameson.scanlon(a)googlemail.com>om>:
Does anyone on the wikitech mailing list happen
to know whether it
would be possible for some of the larger wikipedia database downloads
(which are, say, 16GB or so in size) to be split into parts so that
they can be downloaded. For whatever reason, whenever I have
attempted to download the ~14GB files (say, from
http://static.wikipedia.org/downloads/2008-06/en/ ), I have found that
only 2GB (presumably, the first 2GB) of what I have sought to download
has actually been downloaded. Is there anyway around this? Could
anyone possibly suggest what possible reasons there might be for this
difficulty in downloading the material?
Downloading to a filesystem that only does maximum 2GB files?
Also, several http clients don't like files over 2GB - this is because the large
number of bytes in the Length field causes an integer overflow (2GB is the 31
bit limit). wget likes to die with a segmentation fault on those. I found that
curl works.
But of course, the file system also has to support very large files, as Gerard said.
Finally: yes, it would be nive to have such dumps available in pieces of perhaps
1GB in size.
-- daniel
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
--
"Maybe you knew early on that your track went from point A to B, but
unlike you I wasn't given a map at birth!" Alyssa, "Chasing Amy"