Hi!
Thanks for noticing and sharing. Another known issue with HTML dumps is that it seems that categories and templates are not always extracted: https://phabricator.wikimedia.org/T300124
Mitar
On Tue, Apr 5, 2022 at 12:59 PM Jan Berkel jan@berkel.fr wrote:
Hello,
just a heads-up for anyone using HTML dumps, apart from the missing namespaces issue already mentioned on this list, there also seem to be entire pages missing, and some of the included page data is outdated and does not contain the latest changes. I have no idea how many pages are affected.
phabricator ticket with more details: https://phabricator.wikimedia.org/T305407
– Jan _______________________________________________ Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org To unsubscribe send an email to xmldatadumps-l-leave@lists.wikimedia.org