Greetings XML Dump users and contributors!
This is your automatic monthly Dumps FAQ update email. This update
contains figures for the 20240401 full revision history content run.
We are currently dumping 982 projects in total.
---------------------
Stats for dkwikimedia on date 20240401
Total size of page content dump files for articles, current content only:
1,197,640
Total size of page content dump files for all pages, current content only:
2,454,891
Total size of page content dump files for all pages, all revisions:
106,011,011
---------------------
Stats for enwiki on date 20240401
Total size of page content dump files for articles, current content only:
99,343,047,221
Total size of page content dump files for all pages, current content only:
205,053,944,117
Total size of page content dump files for all pages, all revisions:
28,539,050,275,897
---------------------
Sincerely,
Your friendly Wikimedia Dump Info Collector
Hello, I'm working on a little project to perform cryptographically-sound
timestamping on Wikipedia snapshots. I'm using the opentimestamps.org
service, which by default uses the SHA-256 hash. In order to get the
SHA-256 for the timestamp, I need to download each file and compute the
hash.
Currently the xml data dumps provide only the MD5 and SHA-1 hashes. Both of
these hash functions are obsolete because they are cryptographically
broken. I'm wondering: would the maintainers of this service be willing to
add SHA-256 digests to the dumpstatus and checksum files going forward?
SHA-256 is still cryptographically sound and would allow me to verify that
I have the correct hash for timestamping.
Thanks in advance!
Best regards,
Arthur
I was reading about incremental xml dumps on this page: https://dumps.wikimedia.org/other/incr/
While I understand this service is experimental and may stop working at a given time, I was curious about how frequent the incremental dumps are when the system is working properly. Also, how common is it for the incremental dumps to stop working?