at least, they will be when the dump is complete, which will take a few more hours yet. once done, image dumps will be found at:
http://dumps.wikimedia.org/images/
please read the readme files. (note: don't download dumps without an "upload.tar" symlink, because that means the dump is still in progress and the file will be incomplete!)
please let me know about any problems with these files, particularly if they don't extract correctly.
kate.
On Sun, 29 May 2005, Kate Turner wrote:
at least, they will be when the dump is complete, which will take a few more hours yet. once done, image dumps will be found at:
http://dumps.wikimedia.org/images/
please read the readme files. (note: don't download dumps without an "upload.tar" symlink, because that means the dump is still in progress and the file will be incomplete!)
please let me know about any problems with these files, particularly if they don't extract correctly.
Test using http://dumps.wikimedia.org/images/wikipedia/fi/20050530_upload.tar
GNU tar on Cygwin extracts all files correctly, except for the last ones (outside any subdirectory), where I get lots of "tar: Skipping to next header" errors, and a bunch of invalid gif and png files (button_bold.gif, button_bold.png and so on).
pax doesn't seem to be available for Cygwin. I found with surprise that there's one included with windows, but it doesn't work:
pax: - : This doesn't look like a tar archive pax: - : Skipping to next file...
only the first 832 files are extracted, out of 3000.
Alfio
Alfio Puglisi wrote:
GNU tar on Cygwin extracts all files correctly, except for the last ones [...]. pax doesn't seem to be available for Cygwin. I found with surprise that there's one included with windows, but it doesn't work.
Have you tried Winzip? I have always extracted [.tar].gz files with Winzip, and it always worked fine. (It doesn't do bzip2 though, unfortunately.)
On Thu, 2 Jun 2005, Timwi wrote:
Alfio Puglisi wrote:
GNU tar on Cygwin extracts all files correctly, except for the last ones [...]. pax doesn't seem to be available for Cygwin. I found with surprise that there's one included with windows, but it doesn't work.
Have you tried Winzip? I have always extracted [.tar].gz files with Winzip, and it always worked fine. (It doesn't do bzip2 though, unfortunately.)
Winzip gives an error:
"Error reading header after processing 2872 entries"
and doesn't extract any file.
Alfio
it would be great if a windows compatible 'unzipper' could be used for those windows users - guess its a case of finding one
Alfio Puglisi wrote in gmane.science.linguistics.wikipedia.technical:
On Sun, 29 May 2005, Kate Turner wrote:
please let me know about any problems with these files, particularly if they don't extract correctly.
Test using http://dumps.wikimedia.org/images/wikipedia/fi/20050530_upload.tar
GNU tar on Cygwin extracts all files correctly, except for the last ones (outside any subdirectory), where I get lots of "tar: Skipping to next header" errors, and a bunch of invalid gif and png files (button_bold.gif, button_bold.png and so on).
thanks.
these files are symlinks in the image directory. there seems to be a bug where symlinks are not handled correctly when creating the tar file. nonetheless, since the desired behaviour is that they are not included at all, this shouldn't be an issue (assuming the rest of the files are extracted okay). i'll try to fix this for the next dump.
pax doesn't seem to be available for Cygwin. I found with surprise that there's one included with windows, but it doesn't work:
pax: - : This doesn't look like a tar archive pax: - : Skipping to next file...
it's possible this pax doesn't understand the newer POSIX format. do you know where it comes from?
only the first 832 files are extracted, out of 3000.
Alfio
kate.
On Fri, 3 Jun 2005, Kate Turner wrote:
Alfio Puglisi wrote in gmane.science.linguistics.wikipedia.technical:
pax doesn't seem to be available for Cygwin. I found with surprise that there's one included with windows, but it doesn't work:
pax: - : This doesn't look like a tar archive pax: - : Skipping to next file...
it's possible this pax doesn't understand the newer POSIX format. do you know where it comes from?
Who knows, it's included in the windows directory (windows 2000 SP4). The help file says:
"The following formats are supported.
cpio - The extended CPIO interchange format specified in Extended CPIO Format in IEEE Std. 1003.1-1988. ustar - The extended TAR interchange format specified in Extended TAR Format in IEEE Std. 1003.1-1988. This is the default archive format."
Both of them seem quite old :-)
Alfio
wikitech-l@lists.wikimedia.org