Comments requested: produce empty abstract files for Wikidata? - Xmldatadumps-l

21 Oct 2019


      Currently, the abstracts dump for Wikidata consists of 62 million entries,
all of which contain <abstract 'not-applicable' /> instead of any real
abstract. Instead of this, I am considering producing abstract files that
would contain only the mediawiki header and footer and the usual siteinfo
contents. What do people think about this?
Rationale:
It takes 36 hours of time to produce these useless files.
It places an extra burden on the db servers for no good reason.
It requires more bandwidth to download and process these useless files than
having a file with no entries.
Wikidata will only ever have Q-entities or other entities in the main
namespace that are not text or wikitext and so are not suitable for
abstracts.
Please comment here or on the task:
https://phabricator.wikimedia.org/T236006
If there are no comments or blockers after a week, I'll start implementing
this, and it will likely go into effect for the November 20th run.
Your faithful dumps wrangler,
Ariel Glenn