Hi Brian,
On 11-02-2022 15:49, Brian King wrote:
Hello Wikitech,
I’m a new SRE on the Search team. As Ryan and I are in the middle of relocating many large shards on our Elastic cluster, I wanted to ask your thoughts about using jumbo frames and/or LACP for our physical Elastic nodes. We’re also moving to new hardware 10Gbps networking, so it seems like a good time to start optimizing our network settings.
Let us know what you think (and please feel free to suggest any other optimizations).
Unexpected question on this list. Link aggregation (lag) is an easy way to add more bandwidth without changing the logical setup. In core networks I'm used to only use lags (even if it has only one member) for future growth. LACP is the protocol of choice because it's open and widely supported. Do use fast mode (interval of 1 second instead of 30 seconds). For critical links you can also use microBFD, but that seems a bit overkill in your case. If you have switches that support MC-LAG, you can connect to different switches and only have reduced bandwidth when one of the switches is unavailable.
Every network should be build for jumbo frames with an physical mtu around 9216 (different vendors, different ways of calculating it) and an IP mtu of 9000. Gives better performance for bulk traffic like backup, probably less in other cases. Usually no problem with standard switches and routers, but firewalls and load balancers might be problematic.
Do buy network interfaces that support the proper offloading because otherwise it will hit your CPU. I recall some cases in the past where the slightly cheaper cards didn't support vxlan offloading and the likes. Not sure if that's still a thing these days.
Oh and if you're not using 10GBASE-T: Do buy right optics. 10GBASE-SR (multimode, 850 nm, usually black) and 10GBASE-LR (singlemode, 1310 nm, usually blue) don't mix :-)
Maarten