Hi All.
Does anyone have an estimate of the current size of the full (all pages, all revisions) dump for the english Wikipedia? My understanding was that the full dump was about 10 terabytes as of May 2016, but when I grabbed one of the 7zip files and decompressed, it came to a full 72 gigabytes. That, times the 500 odd 7zip files would seem to indicate a 30TB dump or more! Does anyone know the actual size of the decompressed dump?
- Andy Famiglietti New Media Scholar, Intermittent Wikipedian, Vaguely Humanoid
Andy Famiglietti, 26/09/2018 22:37:
Does anyone know the actual size of the decompressed dump?
Silly grep solution from a WMF Labs machine:
$ find /public/dumps/public/enwiki/20180901 -name "enwiki*pages-meta-history*7z" -print0 | xargs -0 -n1 -I§ sh -c "7z l § | tail -n 1 | grep -Eo '^ +[0-9]+'" | sed --regexp-extended 's,^ +,+,g' | paste -s -d " " | sed 's,^+,,' | bc 17959415517241
So, 16 TiB?
Federico
xmldatadumps-l@lists.wikimedia.org