Hi Xabriel,

Thanks! I opened a task: https://phabricator.wikimedia.org/T363184

Best regards,
Arthur

On Tue, Apr 23, 2024 at 7:25 AM Xabriel Collazo Mojica <xcollazo@wikimedia.org> wrote:
Arthur,

The current Dumps infrastructure is in maintenance mode. But I'd be definitely nice to consider SHA-256 for Dumps 2.0.

What Federico mentions seems like the best choice if you need this now. Dumps 2.0 will take a long while, but do feel free to open the task at https://phabricator.wikimedia.org/ and tag it with "Dumps 2.0". Kindly please describe your use case over there as well.

Thanks,
-xabriel

On Mon, Apr 22, 2024 at 10:34 PM Arthur D. Edelstein <arthuredelstein@gmail.com> wrote:
Many thanks, Federico! I am taking this approach.

On Sat, Apr 20, 2024 at 10:41 AM Federico Leva (Nemo) <nemowiki@gmail.com> wrote:
Adding new checksum files may or may not be a big deal. If the snapshot
hosts have enough memory to keep the files in cache a bit longer, so
they don't need to be read back from disk, running new checksums may be
very fast.

https://wikitech.wikimedia.org/wiki/Dumps has more information on the setup.

Il 20/04/24 03:40, Arthur D. Edelstein ha scritto:
> In order to get the
> SHA-256 for the timestamp, I need to download each file and compute the
> hash.

I understand it's suboptimal, but if you're in a rush you can also use
Toolforge and create a tool, a bit like
https://dump-torrents.toolforge.org/ , to run sha256sum on the
appropriate files (which are mounted even on the bastion host). I/O
tends to be rather slow but may still be faster than your networking.

Best,
        Federico
_______________________________________________
Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org
To unsubscribe send an email to xmldatadumps-l-leave@lists.wikimedia.org


--
Xabriel J. Collazo Mojica (he/him, pronunciation)
Sr Software Engineer
Wikimedia Foundation