Hello.
I have more details. The problematic change was introduced after the dump from 2024-05-01 ; I checked with the grep command listed below.
An obvious indication of a problem is that the (unpacked) dump size dropped by 6.5 % from 2024-05-01 to 2024-05-20.
Hope that helps ... Sven
I wrote:
Hello.
The dump dewiki-20240520-pages-articles.xml contains many (96069 for ns 0) empty articles. The first one is for <id>15</id>, the last one for <id>13102212</id> For ns=0, this is a new phenomenon (introduced after 2024-03-01). For all articles, the number of affected articles grew a lot:
# grep -c ' <text bytes="[0-9]*" />' dewiki-20240520-pages-articles.xml 101259
# grep -c ' <text bytes="[0-9]*" />' dewiki-20240301-pages-articles.xml 129
Greetings Sven