Hi, this is very likely because of https://phabricator.wikimedia.org/T365155

Once that's fixed, it should get back to normal.


Am Mi., 22. Mai 2024 um 11:18 Uhr schrieb Sven Hartrumpf via Xmldatadumps-l <xmldatadumps-l@lists.wikimedia.org>:

I have more details.
The problematic change was introduced after the dump from 2024-05-01 ; I checked with
the grep command listed below.

An obvious indication of a problem is that the (unpacked) dump size dropped by 6.5 %
from 2024-05-01 to 2024-05-20.

Hope that helps ...

I wrote:

> Hello.
> The dump dewiki-20240520-pages-articles.xml contains many (96069 for ns 0) empty articles.
> The first one is for <id>15</id>, the last one for <id>13102212</id>
> For ns=0, this is a new phenomenon (introduced after 2024-03-01).
> For all articles, the number of affected articles grew a lot:
>   # grep -c '  <text bytes="[0-9]*" />' dewiki-20240520-pages-articles.xml
> 101259
>   # grep -c '  <text bytes="[0-9]*" />' dewiki-20240301-pages-articles.xml
> 129
> Greetings
> Sven
Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org
To unsubscribe send an email to xmldatadumps-l-leave@lists.wikimedia.org

Amir (he/him)