[Cloud] NFS bandwidth to VPS nodes?

19 Apr 2021

I'm exploring various ways of working with the XML data dumps on
/publib/dumps/public/enwiki.  I've got a process which runs through all of the
enwiki-20210301-pages-articles[123456789]*.xml* files in about 6 hours.  If I've done
the math right, that's just about 18 GB of data, or 3 GB/h, or 8 MB/s that I'm
slurping off NFS.

If I were to spin up 8 VPS nodes and run 8 jobs in parallel, in theory I could process 64
MB/s (512 Mb/s).  Is that realistic?  Or am I just going to beat the hell out of the poor
NFS server, or peg some backbone network link, or hit some other rate limiting bottleneck
long before I run out of CPU?  Hitting a bottleneck doesn't bother me so much as not
wanting to trash a shared resource by doing something stupid to it.

Putting it another way, would trying this be a bad idea?

2024

2023

2022

2021

2020

2019

2018

2017

[Cloud] NFS bandwidth to VPS nodes?