Hello.
I have more details.
The problematic change was introduced after the dump from 2024-05-01 ; I checked with
the grep command listed below.
An obvious indication of a problem is that the (unpacked) dump size dropped by 6.5 %
from 2024-05-01 to 2024-05-20.
Hope that helps ...
Sven
I wrote:
> Hello.
>
> The dump dewiki-20240520-pages-articles.xml contains many (96069 for ns 0) empty articles.
> The first one is for <id>15</id>, the last one for <id>13102212</id>
> For ns=0, this is a new phenomenon (introduced after 2024-03-01).
> For all articles, the number of affected articles grew a lot:
>
> # grep -c ' <text bytes="[0-9]*" />' dewiki-20240520-pages-articles.xml
> 101259
>
> # grep -c ' <text bytes="[0-9]*" />' dewiki-20240301-pages-articles.xml
> 129
>
> Greetings
> Sven
_______________________________________________
Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org
To unsubscribe send an email to xmldatadumps-l-leave@lists.wikimedia.org