Greetings XML Dump users and contributors!
This is your automatic monthly Dumps FAQ update email. This update
contains figures for the 20211101 full revision history content run.
We are currently dumping 945 projects in total.
---------------------
Stats for igwiki on date 20211101
Total size of page content dump files for articles, current content only:
14,294,974
Total size of page content dump files for all pages, current content only:
21,955,636
Total size of page content dump files for all pages, all revisions:
322,127,219
---------------------
Stats for enwiki on date 20211101
Total size of page content dump files for articles, current content only:
86,097,776,919
Total size of page content dump files for all pages, current content only:
189,730,768,977
Total size of page content dump files for all pages, all revisions:
23,676,250,697,261
---------------------
Sincerely,
Your friendly Wikimedia Dump Info Collector
Hallo!
Maybe you could change the directory listing a little bit.
In https://dumps.wikimedia.org/wikidatawiki/latest/ I see a lot of
file names which are too long:
wikidatawiki-latest-pages-articles-multistream-..> 23-Nov-2021 20:53
329566492
wikidatawiki-latest-pages-articles-multistream-..> 24-Nov-2021 16:37
877
wikidatawiki-latest-pages-articles-multistream-..> 23-Nov-2021 07:17
1471303
...
Thanks!
Wolfgang
Greetings XML Dump users and contributors!
This is your automatic monthly Dumps FAQ update email. This update
contains figures for the 20211001 full revision history content run.
We are currently dumping 942 projects in total.
---------------------
Stats for frwikibooks on date 20211001
Total size of page content dump files for articles, current content only:
139,654,749
Total size of page content dump files for all pages, current content only:
193,441,591
Total size of page content dump files for all pages, all revisions:
9,942,075,801
---------------------
Stats for enwiki on date 20211001
Total size of page content dump files for articles, current content only:
85,634,741,423
Total size of page content dump files for all pages, current content only:
188,778,524,710
Total size of page content dump files for all pages, all revisions:
23,511,675,704,710
---------------------
Sincerely,
Your friendly Wikimedia Dump Info Collector
I am pleased to announce that Wikimedia Enterprise's HTML dumps [1] for
October 17-18th are available for public download; see
https://dumps.wikimedia.org/other/enterprise_html/ for more information. We
expect to make updated versions of these files available around the 1st/2nd
of the month and the 20th/21st of the month, following the cadence of the
standard SQL/XML dumps.
This is still an experimental service, so there may be hiccups from time to
time. Please be patient and report issues as you find them. Thanks!
Ariel "Dumps Wrangler" Glenn
[1] See https://www.mediawiki.org/wiki/Wikimedia_Enterprise for much more
about Wikimedia Enterprise and its API.
Greetings XML Dump users and contributors!
This is your automatic monthly Dumps FAQ update email. This update
contains figures for the 20210901 full revision history content run.
We are currently dumping 942 projects in total.
---------------------
Stats for iswikiquote on date 20210901
Total size of page content dump files for articles, current content only:
1,406,306
Total size of page content dump files for all pages, current content only:
2,239,225
Total size of page content dump files for all pages, all revisions:
57,520,269
---------------------
Stats for enwiki on date 20210901
Total size of page content dump files for articles, current content only:
85,212,251,874
Total size of page content dump files for all pages, current content only:
187,857,277,782
Total size of page content dump files for all pages, all revisions:
23,348,582,776,634
---------------------
Sincerely,
Your friendly Wikimedia Dump Info Collector
strangely the enwiki dump is complete, but not a lot of the other
wikis (such as enwiktionary) that are a lot smaller
usually they're finished after a few days
something going wrong with the dump script?
Greetings XML Dump users and contributors!
This is your automatic monthly Dumps FAQ update email. This update
contains figures for the 20210801 full revision history content run.
We are currently dumping 942 projects in total.
---------------------
Stats for ilowiki on date 20210801
Total size of page content dump files for articles, current content only:
110,426,279
Total size of page content dump files for all pages, current content only:
120,468,480
Total size of page content dump files for all pages, all revisions:
1,661,587,793
---------------------
Stats for enwiki on date 20210801
Total size of page content dump files for articles, current content only:
84,775,627,212
Total size of page content dump files for all pages, current content only:
186,886,040,561
Total size of page content dump files for all pages, all revisions:
23,185,464,570,213
---------------------
Sincerely,
Your friendly Wikimedia Dump Info Collector
I've been using the monthly page view summaries from pagecounts-ez. Now on https://dumps.wikimedia.org/other/pagecounts-ez/ it says:
"NOTE: This dataset has had some problems and we are no longer generating new data, since September 2020. We are phasing it out in favor of Pageviews Complete... When it's finished we will announce it widely and explain how to migrate."
Is the announcement and explanation available somewhere. I'm having problems because
1. The "totals" files, such as https://dumps.wikimedia.org/other/pagecounts-ez/merged/pagecounts-2020-08-v…, which are of the order of 500Mb per month seem to have no equivalents in the new pageview complete dump archives. The monthly files at https://dumps.wikimedia.org/other/pageview_complete/monthly/2020/2020-08/ are 10x larger (and I can't find any description of what the "automated" "user" and "spider" files represent, although I can guess)
2. If I download (say) https://dumps.wikimedia.org/other/pageview_complete/monthly/2020/2020-08/pa…, and peek at the file using bzless, it seems to contain lots of binary characters: it's not clear to me what the format is, or how to decode it. Is there any information online to help me?
Thanks for any pointers that might help.
Greetings XML Dump users and contributors!
This is your automatic monthly Dumps FAQ update email. This update
contains figures for the 20210701 full revision history content run.
We are currently dumping 941 projects in total.
---------------------
Stats for azwikisource on date 20210701
Total size of page content dump files for articles, current content only:
54,945,209
Total size of page content dump files for all pages, current content only:
62,037,845
Total size of page content dump files for all pages, all revisions:
349,972,631
---------------------
Stats for enwiki on date 20210701
Total size of page content dump files for articles, current content only:
84,219,557,413
Total size of page content dump files for all pages, current content only:
185,886,341,450
Total size of page content dump files for all pages, all revisions:
23,026,883,007,314
---------------------
Sincerely,
Your friendly Wikimedia Dump Info Collector
Greetings XML Dump users and contributors!
This is your automatic monthly Dumps FAQ update email. This update
contains figures for the 20210601 full revision history content run.
We are currently dumping 938 projects in total.
---------------------
Stats for bowikibooks on date 20210601
Total size of page content dump files for articles, current content only:
21,939
Total size of page content dump files for all pages, current content only:
120,566
Total size of page content dump files for all pages, all revisions:
239,642
---------------------
Stats for enwiki on date 20210601
Total size of page content dump files for articles, current content only:
83,761,691,097
Total size of page content dump files for all pages, current content only:
184,983,464,749
Total size of page content dump files for all pages, all revisions:
22,873,288,105,970
---------------------
Sincerely,
Your friendly Wikimedia Dump Info Collector