Thanks for noticing and sharing. Another known issue with HTML dumps is that it seems that categories and templates are not always extracted: https://phabricator.wikimedia.org/T300124
How are these dumps actually produced? Where is code hosted? If the process is easily reproducible "at home", maybe it'd be possible to help with the bug hunting. But I suspect the scripts need direct access to the production databases.
– J