On Fri, Jan 5, 2024 at 5:03 PM Wurgl <heisewurgl@gmail.com> wrote:
>
> Hello!
>
> I am having some unexpected messages, so I tried the following:
>
> curl -s https://dumps.wikimedia.org/wikidatawiki/latest/wikidatawiki-latest-pages-articles-multistream.xml.bz2 | bzip2 -d | tail
>
> an got this:
>
> bzip2: Compressed file ends unexpectedly;
> perhaps it is corrupted? *Possible* reason follows.
> bzip2: Inappropriate ioctl for device
> Input file = (stdin), output file = (stdout)
>
> It is possible that the compressed file(s) have become corrupted.
The file I received was fine and the sha1sum matches that of
wikidatawiki-20240101-pages-articles-multistream.xml.bz2 mention in
the posting of Xabriel Collazo Mojica:
--- 8< ---
$ sha1sum wikidatawiki-latest-pages-articles-multistream.xml.bz2
1be753ba90e0390c8b65f9b80b08015922da12f1
wikidatawiki-latest-pages-articles-multistream.xml.bz2
--- >8 ---
bunzip2 did not report any problem, however, my first attempt to
decompress ended with a full disk after more that 2.3 TB of xml.
The second attempt
--- 8< ---
$ bunzip2 -cv wikidatawiki-latest-pages-articles-multistream.xml.bz2
| tail -n 10000 >
wikidatawiki-latest-pages-articles-multistream_tail_-n_10000.xml
wikidatawiki-latest-pages-articles-multistream.xml.bz2: done
--- >8 ---
resulted in nice XML fragment which ends with
--- 8< ---
<page>
<title>Q124069752</title>
<ns>0</ns>
<id>118244259</id>
<revision>
<id>2042727399</id>
<parentid>2042727216</parentid>
<timestamp>2024-01-01T20:37:28Z</timestamp>
<contributor>
<username>Kalepom</username>
<id>1900170</id>
</contributor>
<comment>/* wbsetclaim-create:2||1 */ [[Property:P2789]]:
[[Q16506931]]</comment>
<model>wikibase-item</model>
<format>application/json</format>
<text bytes="2535" xml:space="preserve">...</text>
<sha1>9gw926vh84k1b5h6wnuvlvnd2zc3a9b</sha1>
</revision>
</page>
</mediawiki>
--- >8 ---
So, I assume, your curl did not return the full 142 GB of
wikidatawiki-latest-pages-articles-multistream.xml.bz2 .
P.S.: I'll start a new bunzip2 to a larger scratch disk just to find
out, how big this xml file really is.
regards, Gerhard
_______________________________________________
Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org
To unsubscribe send an email to xmldatadumps-l-leave@lists.wikimedia.org