Thanks for noticing and sharing. Another known issue
with HTML dumps
is that it seems that categories and templates are not always
extracted:
https://phabricator.wikimedia.org/T300124
How are these dumps actually produced? Where is code hosted? If the process is easily
reproducible "at home", maybe it'd be possible to help with the bug hunting.
But I suspect the scripts need direct access to the production databases.
– J