On Wed, Jan 10, 2024 at 6:19 PM Wurgl <heisewurgl(a)gmail.com> wrote:
The relevant line is this one:
curl -s
https://dumps.wikimedia.org/wikidatawiki/latest/wikidatawiki-latest-pages-a…
| bzip2 -d | php ~/dumps/wikidata_sitelinks.php
Yes, I double-checked it on my machine at home and the same type of error happened.
Well, we now know that the xml.bz2 file itself is ok. The usual way
to debug this would be to perform each step of the above pipe in
isolation, which I more or less did. The xml.bz2 file arrived ok, but
I used wget for that and that job alone ran for about 12 hours to
retrieve the ~150 GB file. Also, bunzip2 worked for me, as mentioned
in an earlier posting and I found the expected closing tag
"</mediawiki>" in the last line. So, also at least my bunzip2
(Version 1.0.6, 6-Sept-2010) seems to be ok or ok with that file.
As I already mentioned, from the messages in your original mail, I can
only venture a guess here, is that you curl -s simply did not retrieve
the full file. Try ommitting the -s for a test.
regards, Gerhard