This looks to be what happened. When I had downloaded the file it was
currently the "latest" version. I've since gunzipped it, so I don't
have
the original md5sum, but the timestamp on the file I have is before the
Phabricator issue was opened so there's a good chance this is one of the
"bad" dumps.
Thanks much for your help!
On Sun, May 1, 2016 at 10:07 PM, gnosygnu <gnosygnu(a)gmail.com> wrote:
I checked my local copy of
enwiki-20160407-pages-articles.xml.bz2 and
all those articles are present. For reference, my copy was downloaded
on 4-27, has a size of 12,878,552,649 and an md5 of
cff68321a17392fbb3322b34b61b0402. Also, the following grep command
found the page:
grep
"<title>Litecoin</title>" enwiki-latest-pages-articles.xml
<title>Litecoin</title>
Did you download your dump earlier last month? There was a known bad
version that was truncated by about 2 GB (somewhere around 10.8 GB).
It was missing most of the pages in the Module namespace, and may have
been missing these as well. See:
https://lists.wikimedia.org/pipermail/xmldatadumps-l/2016-April/001301.html
If yours is 10.8 GB, then you should redownload the latest version.
The modified time haven't changed, but it has been updated. See:
https://phabricator.wikimedia.org/T133416#2252100
Hope this helps.
On Sun, May 1, 2016 at 2:00 AM, Marcus Truscello
<marcus.truscello(a)gmail.com> wrote:
Actually, it seems quite a few cryptocurrency
related articles are
missing.
Litecoin, Ethereum, Namecoin, Dogecoin,
CryptoNote, etc.
Most of these articles transclude Template:Cryptocurrencies, but I'm not
sure why they wouldn't appear in the XML dump.
_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l