Greetings XML Dump users and contributors!
This is your automatic monthly Dumps FAQ update email. This update
contains figures for the 20240101 full revision history content run.
We are currently dumping 982 projects in total.
---------------------
Stats for tnwiktionary on date 20240101
Total size of page content dump files for articles, current content only:
2,013,030
Total size of page content dump files for all pages, current content only:
2,691,260
Total size of page content dump files for all pages, all revisions:
12,375,755
---------------------
Stats for enwiki on date 20240101
Total size of page content dump files for articles, current content only:
97,866,837,150
Total size of page content dump files for all pages, current content only:
201,781,306,071
Total size of page content dump files for all pages, all revisions:
28,002,734,610,233
---------------------
Sincerely,
Your friendly Wikimedia Dump Info Collector
(*apologies for cross-posting*)
Hello,
This is a breaking change announcement relevant to those working with
Lexeme dumps.
In Lexeme dumps, "senses" and "forms" values, when not empty, are shown as
arrays. When these lists are empty, they are currently displayed as
objects. For example, values with content are displayed in array
format: "senses":[{"id":"L4-S1",...]
but empty values are treated as objects: "senses":{}
However, empty lists should be presented as arrays as well: "senses":[]
In this change, empty lists of forms and senses will be switched from
objects to arrays. This adjustment makes the dumps more consistent and
matches the same way non-empty values are presented. We will roll this
change out on February 8th.
We anticipate the impact of this change to be minimal and harmless for most
use cases. Therefore, we haven't generated a test dump, as it would demand
substantial resources and time. If you have any questions or concerns about
this change, please don’t hesitate to reach out to us in this ticket (
T305660 <https://phabricator.wikimedia.org/T305660>).
Cheers,
--
Mohammed S. Abdulai
*Community Communications Manager, Wikidata*
Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin
Phone: +49 (0) 30 577 116 2466
https://wikimedia.de
Grab a spot in my calendar for a chat: calendly.com/masssly.
A lot is happening around Wikidata - Keep up to date!
<https://www.wikidata.org/wiki/Wikidata:Status_updates> Current news and
exciting stories about Wikimedia, Wikipedia and Free Knowledge in our
newsletter (in German): Subscribe now <https://www.wikimedia.de/newsletter/>
.
Imagine a world in which every single human being can freely share in the
sum of all knowledge. Help us to achieve our vision!
https://spenden.wikimedia.de
Wikimedia Deutschland — Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Charlottenburg, VR 23855 B.
Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin,
Steuernummer 27/029/42207. Geschäftsführende Vorstände: Franziska Heine,
Dr. Christian Humborg
Hello!
I am having some unexpected messages, so I tried the following:
curl -s
https://dumps.wikimedia.org/wikidatawiki/latest/wikidatawiki-latest-pages-a…
| bzip2 -d | tail
an got this:
bzip2: Compressed file ends unexpectedly;
perhaps it is corrupted? *Possible* reason follows.
bzip2: Inappropriate ioctl for device
Input file = (stdin), output file = (stdout)
It is possible that the compressed file(s) have become corrupted.
You can use the -tvv option to test integrity of such files.
You can use the `bzip2recover' program to attempt to recover
data from undamaged sections of corrupted files.
<parentid>1227967782</parentid>
<timestamp>2023-12-07T00:22:05Z</timestamp>
<contributor>
<username>Renamerr</username>
<id>2883061</id>
</contributor>
<comment>/* wbsetdescription-add:1|uk */ бактеріальний білок, наявний
у Listeria monocytogenes EGD-e,
[[:toollabs:quickstatements/#/batch/218434|batch #218434]]</comment>
<model>wikibase-item</model>
<format>application/json</format>
The first part is an error message which I could also see when running my
PHP-script from within the toolserver-cloud (php 7.4 because class
XMLReader with the installed php 8.2 simple core dumps, see T352886). The
second part is the output from the "tail" command.
Just as a crosschek: I have no such problem with
curl -s
https://dumps.wikimedia.org/dewiki/latest/dewiki-latest-pages-meta-current.…
| bzip2 -d | tail
No error and the last line is "</mediawiki>"
Cheers,
Wolfgang