Once that's fixed, it should get back to normal.
Best
Am Mi., 22. Mai 2024 um 11:18 Uhr schrieb Sven Hartrumpf via Xmldatadumps-l
<xmldatadumps-l(a)lists.wikimedia.org>rg>:
Hello.
I have more details.
The problematic change was introduced after the dump from 2024-05-01 ; I
checked with
the grep command listed below.
An obvious indication of a problem is that the (unpacked) dump size
dropped by 6.5 %
from 2024-05-01 to 2024-05-20.
Hope that helps ...
Sven
I wrote:
Hello.
The dump dewiki-20240520-pages-articles.xml contains many (96069 for ns
0) empty
articles.
The first one is for <id>15</id>, the
last one for <id>13102212</id>
For ns=0, this is a new phenomenon (introduced after 2024-03-01).
For all articles, the number of affected articles grew a lot:
# grep -c ' <text bytes="[0-9]*" />'
dewiki-20240520-pages-articles.xml
101259
# grep -c ' <text bytes="[0-9]*" />'
dewiki-20240301-pages-articles.xml
129
Greetings
Sven
_______________________________________________
Xmldatadumps-l mailing list -- xmldatadumps-l(a)lists.wikimedia.org
To unsubscribe send an email to xmldatadumps-l-leave(a)lists.wikimedia.org