Our pageview dumps were in the middle of a refactor when our team changed a lot. We haven't been able to finish it, but we do actually have a well-compressed version that we just haven't properly launched as a new dataset. I'm working on prioritizing that.
On Sun, Sep 4, 2022 at 02:58 Gergő Tisza gtisza@gmail.com wrote:
I'd imagine the current format is optimized for being able to output hourly dumps (and thus reducing data latency and data processing costs), not so much for storage space _______________________________________________ Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/