[Xmldatadumps-l] Comments requested: produce empty abstract files for Wikidata?

21 Oct 2019

Currently, the abstracts dump for Wikidata consists of 62 million entries,
all of which contain <abstract 'not-applicable' /> instead of any real
abstract. Instead of this, I am considering producing abstract files that
would contain only the mediawiki header and footer and the usual siteinfo
contents. What do people think about this?

Rationale:

It takes 36 hours of time to produce these useless files.
It places an extra burden on the db servers for no good reason.
It requires more bandwidth to download and process these useless files than
having a file with no entries.
Wikidata will only ever have Q-entities or other entities in the main
namespace that are not text or wikitext and so are not suitable for
abstracts.

Please comment here or on the task:
https://phabricator.wikimedia.org/T236006

If there are no comments or blockers after a week, I'll start implementing
this, and it will likely go into effect for the November 20th run.

Your faithful dumps wrangler,

Ariel Glenn

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

[Xmldatadumps-l] Comments requested: produce empty abstract files for Wikidata?