I downloaded:
http://dumps.wikimedia.your.org/other/static_html_dumps/2008-06/en/wikipedia...
using wget and it seems to be fine:
$ _IFL="wikipedia-en-html.tar.7z"
$ ls -l "${_IFL}" -rw-r--r-- 1 niggahme niggahme 15363543213 Jun 21 2008 wikipedia-en-html.tar.7z
$ file "${_IFL}" wikipedia-en-html.tar.7z: 7-zip archive data, version 0.2
$ md5sum -b "${_IFL}" 03ce695cbf32a3f8636fa8d3f9c7d12e *wikipedia-en-html.tar.7z
$ sha256sum -b "${_IFL}" c2794b6371a05017f03e2a345730fd763b1052872290b5c78763978a0b43c747 *wikipedia-en-html.tar.7z
$ sha512sum -b "${_IFL}" d52a737ceca25ef18272ba70a4a56000a7a0bff92653fb462674333a0855f397c892b8aeb2e11206d391ba4cca48d46f5814d92db4d2096467519de38c5a189c *wikipedia-en-html.tar.7z
$ 7z l "${_IFL}"
7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21 p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,2 CPUs Intel(R) Pentium(R) CPU B940 @ 2.00GHz (206A7),ASM)
Scanning the drive for archives: 1 file, 15363543213 bytes (15 GiB)
Listing archive: wikipedia-en-html.tar.7z
-- Path = wikipedia-en-html.tar.7z Type = 7z Physical Size = 15363543213 Headers Size = 100 Method = LZMA:22 Solid = - Blocks = 1
Date Time Attr Size Compressed Name ------------------- ----- ------------ ------------ ------------------------ 2008-06-18 13:02:15 ..... 223674511360 15363543113 wikipedia-en-html.tar ------------------- ----- ------------ ------------ ------------------------ 2008-06-18 13:02:15 223674511360 15363543113 1 files $
But I ca'nt get the name of the compressed/contained file even though ark and 7z show it to you. Here is my simple piece of code:
String aIFl = "wikipedia-en-html.tar.7z"; File I7ZKFl = new File(aIFl); if(I7ZKFl.exists()){ try{ SevenZFile SvnZFl = new SevenZFile(I7ZKFl); SevenZArchiveEntry entry; int iIx = 0; while((entry = SvnZFl.getNextEntry()) != null){ System.out.println("// __ [" + iIx + "]: |" + entry + "|");
System.out.println("// __ .getName() |" + entry.getName() + "|"); System.out.println("// __ .getSize() |" + entry.getSize() + "|"); System.out.println("// __ .getLastModifiedDate() |" + entry.getLastModifiedDate() + "|");
++iIx; }// ((entry = SvnZFl.getNextEntry()) != null) }catch(IOException IOX){ IOX.printStackTrace(System.err); } }
which, except for the name, its faithful output was:
// __ [0]: |org.apache.commons.compress.archivers.sevenz.SevenZArchiveEntry@179d3b25| // __ .getName() |null| // __ .getSize() |223674511360| // __ .getLastModifiedDate() |Wed Jun 18 14:02:15 EDT 2008|
Why is it that I can't get the file name?
Also, if OO works, I should be able to access and process this file while addressing it like (using an exclamation mark):
wikipedia-en-html.tar.7z!wikipedia-en-html.tar
So, I this point I should be able to go:
String aIFl = "wikipedia-en-html.tar.7z!wikipedia-en-html.tar" FileInputStream FISTarK = new FileInputStream(new File(aIFl)); TarArchiveInputStream tarInput = new TarArchiveInputStream(FISTarK); TarArchiveEntry tArKEnt; while((tArKEnt=tarInput.getNextTarEntry()) != null){ ... }
right?
lbrtchx
xmldatadumps-l@lists.wikimedia.org