Hi Xabriel,
Thanks! I opened a task: https://phabricator.wikimedia.org/T363184
Best regards, Arthur
On Tue, Apr 23, 2024 at 7:25 AM Xabriel Collazo Mojica < xcollazo@wikimedia.org> wrote:
Arthur,
The current Dumps infrastructure is in maintenance mode. But I'd be definitely nice to consider SHA-256 for Dumps 2.0.
What Federico mentions seems like the best choice if you need this now. Dumps 2.0 will take a long while, but do feel free to open the task at https://phabricator.wikimedia.org/ and tag it with "Dumps 2.0". Kindly please describe your use case over there as well.
Thanks, -xabriel
On Mon, Apr 22, 2024 at 10:34 PM Arthur D. Edelstein < arthuredelstein@gmail.com> wrote:
Many thanks, Federico! I am taking this approach.
On Sat, Apr 20, 2024 at 10:41 AM Federico Leva (Nemo) nemowiki@gmail.com wrote:
Adding new checksum files may or may not be a big deal. If the snapshot hosts have enough memory to keep the files in cache a bit longer, so they don't need to be read back from disk, running new checksums may be very fast.
https://wikitech.wikimedia.org/wiki/Dumps has more information on the setup.
Il 20/04/24 03:40, Arthur D. Edelstein ha scritto:
In order to get the SHA-256 for the timestamp, I need to download each file and compute the hash.
I understand it's suboptimal, but if you're in a rush you can also use Toolforge and create a tool, a bit like https://dump-torrents.toolforge.org/ , to run sha256sum on the appropriate files (which are mounted even on the bastion host). I/O tends to be rather slow but may still be faster than your networking.
Best, Federico
Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org To unsubscribe send an email to xmldatadumps-l-leave@lists.wikimedia.org
-- Xabriel J. Collazo Mojica (he/him, pronunciation https://commons.wikimedia.org/wiki/File:Xabriel_Collazo_Mojica_-_pronunciation.ogg ) Sr Software Engineer Wikimedia Foundation