Greetings XML Dump users and contributors!
This is your automatic monthly Dumps FAQ update email. This update
contains figures for the 20191001 full revision history content run.
We are currently dumping 922 projects in total.
---------------------
Stats for cowikimedia on date 20191001
Total size of page content dump files for articles, current content only:
2144131
Total size of page content dump files for all pages, current content only:
3355092
Total size of page content dump files for all pages, all revisions:
80294987
---------------------
Stats for enwiki on date 20191001
Total size of page content dump files for articles, current content only:
73972175645
Total size of page content dump files for all pages, current content only:
164881705920
Total size of page content dump files for all pages, all revisions:
19682086841974
---------------------
Sincerely,
Your friendly Wikimedia Dump Info Collector
Currently, the abstracts dump for Wikidata consists of 62 million entries,
all of which contain <abstract 'not-applicable' /> instead of any real
abstract. Instead of this, I am considering producing abstract files that
would contain only the mediawiki header and footer and the usual siteinfo
contents. What do people think about this?
Rationale:
It takes 36 hours of time to produce these useless files.
It places an extra burden on the db servers for no good reason.
It requires more bandwidth to download and process these useless files than
having a file with no entries.
Wikidata will only ever have Q-entities or other entities in the main
namespace that are not text or wikitext and so are not suitable for
abstracts.
Please comment here or on the task:
https://phabricator.wikimedia.org/T236006
If there are no comments or blockers after a week, I'll start implementing
this, and it will likely go into effect for the November 20th run.
Your faithful dumps wrangler,
Ariel Glenn
Greetings XML Dump users and contributors!
This is your automatic monthly Dumps FAQ update email. This update
contains figures for the 20190901 full revision history content run.
We are currently dumping 920 projects in total.
---------------------
Stats for bawiki on date 20190901
Total size of page content dump files for articles, current content only:
507004601
Total size of page content dump files for all pages, current content only:
533549451
Total size of page content dump files for all pages, all revisions:
11862864063
---------------------
Stats for enwiki on date 20190901
Total size of page content dump files for articles, current content only:
73556940996
Total size of page content dump files for all pages, current content only:
164031720789
Total size of page content dump files for all pages, all revisions:
19534783920279
---------------------
Sincerely,
Your friendly Wikimedia Dump Info Collector