Skip to content

Test standard MTU (1500 bytes)

Complete network performance optimization for the Proxmox cluster.

  • TCP Speed: ~2.5 Mbit/s (should be 25+ Gbit/s)
  • UDP Speed: ~14 Gbit/s (working)
  • Jumbo Frames not passing
  1. Flow Control enabled (RX/TX ON) - killing TCP performance
  2. TCP Buffer sizes too small (default ~212KB instead of 128MB)
  3. Jumbo Frames not enabled on switches (only on edge firewall)
  4. Default TCP congestion control (cubic instead of BBR)
Terminal window
# Test standard MTU (1500 bytes)
ping -M do -s 1472 -c 3 10.20.20.12
# -M do = Don't fragment
# -s 1472 = Payload (1472 + 28 header = 1500)
# Should work ✓
# Test medium MTU (4000 bytes)
ping -M do -s 3972 -c 3 10.20.20.12
# 3972 + 28 = 4000
# Test large MTU (8000 bytes)
ping -M do -s 7972 -c 3 10.20.20.12
# 7972 + 28 = 8000
# Test Jumbo Frame (9000 bytes)
ping -M do -s 8972 -c 3 10.20.20.12
# 8972 + 28 = 9000

UniFi requires Jumbo Frames enabled on EACH switch individually:

Devices → sw-core-zrh-01 → Settings
└─ Advanced → Jumbo Frames ☑
Devices → sw-dist-zrh-01 → Settings
└─ Advanced → Jumbo Frames ☑

Important: Enabling on fw-edge-zrh-01 only is NOT enough!

root@srv-pve-zrh-01:~# ping -M do -s 8972 -c 3 10.20.20.12
PING 10.20.20.12 (10.20.20.12) 8972(9000) bytes of data.
8980 bytes from 10.20.20.12: icmp_seq=1 ttl=64 time=0.245 ms
8980 bytes from 10.20.20.12: icmp_seq=2 ttl=64 time=0.237 ms
8980 bytes from 10.20.20.12: icmp_seq=3 ttl=64 time=0.239 ms
Terminal window
ethtool -a nic5
# Output showing problem:
# Pause parameters for nic5:
# Autonegotiate: off
# RX: on ← PROBLEM!
# TX: on ← PROBLEM!
Terminal window
# Disable Flow Control on ALL NICs
ethtool -A nic1 rx off tx off
ethtool -A nic2 rx off tx off
ethtool -A nic3 rx off tx off
ethtool -A nic4 rx off tx off
ethtool -A nic5 rx off tx off
ethtool -A nic6 rx off tx off
Flow Control ON:
├─ Causes Buffer-Bloat
├─ Adds latency
├─ Reduces throughput
└─ Bad for modern high-speed networks
Flow Control OFF:
├─ Better latency
├─ Higher throughput
├─ Enterprise Best Practice
└─ Required for 10G+ networks
net.core.rmem_max = ~212KB
net.core.wmem_max = ~212KB
tcp_congestion_control = cubic

Create /etc/sysctl.d/99-network-tuning.conf:

Terminal window
# High-Speed Network Tuning for 25Gbit Storage
# Applied to: srv-pve-zrh-01, srv-pve-zrh-02
# TCP Buffer Sizes (128 MB max)
net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
# TCP Auto-Tuning (min, default, max in bytes)
net.ipv4.tcp_rmem = 4096 87380 67108864
net.ipv4.tcp_wmem = 4096 65536 67108864
# Network Backlog (for many simultaneous connections)
net.core.netdev_max_backlog = 50000
# TCP Congestion Control (Google BBR)
net.ipv4.tcp_congestion_control = bbr
net.core.default_qdisc = fq
# MTU Probing (finds optimal packet size)
net.ipv4.tcp_mtu_probing = 1
# TCP Options
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_timestamps = 1
net.ipv4.tcp_sack = 1
Terminal window
sysctl -p /etc/sysctl.d/99-network-tuning.conf
Terminal window
sysctl net.core.rmem_max
# Expected: 134217728
sysctl net.ipv4.tcp_congestion_control
# Expected: bbr
Terminal window
# Bond 1 (Storage 25G)
auto bond1
iface bond1 inet static
address 10.20.20.11/28
bond-slaves nic5 nic6
bond-mode 802.3ad
bond-miimon 100
bond-xmit-hash-policy layer3+4
bond-lacp-rate fast
mtu 9000
# Disable Flow Control
post-up ethtool -A nic5 rx off tx off
post-up ethtool -A nic6 rx off tx off

Add similar post-up lines for all bonds.

TCP (8 streams): 2.5 Mbit/s ❌
UDP (single): 14 Gbit/s
Jumbo Frames: Failed
TCP (8 streams): 24.8 Gbit/s ✅
TCP (100 streams): 48.7 Gbit/s ✅
UDP (single): 14 Gbit/s
Jumbo Frames: 0.24ms latency ✅
Terminal window
# Start server on Node 02
iperf3 -s -B 10.20.20.12
# TCP Test (8 parallel streams)
iperf3 -c 10.20.20.12 -t 10 -P 8
# TCP Test (100 streams for max throughput)
iperf3 -c 10.20.20.12 -t 10 -P 100
# UDP Test
iperf3 -c 10.20.20.12 -u -b 50G -t 10

Why 25 Gbit/s instead of 50 Gbit/s with 8 streams?

Section titled “Why 25 Gbit/s instead of 50 Gbit/s with 8 streams?”
LACP Load-Balancing:
├─ Each TCP stream uses ONLY 1 link!
├─ 8 parallel streams distribute:
│ ├─ 4-5 streams on Link 1 (nic5)
│ └─ 3-4 streams on Link 2 (nic6)
└─ Result: ~25 Gbit/s total
To reach 50 Gbit/s:
└─ Need MANY parallel connections
(e.g., Ceph with 100+ streams)
Single VM Migration: ~20-25 Gbit/s
Ceph (many OSDs): ~45-50 Gbit/s ✅
ZFS Replication: ~20-25 Gbit/s
NetworkMTUReason
Storage (VLAN 20)9000Maximum performance for Ceph/Storage
VM Traffic (VLAN 30+)1500VMs use default, simpler management
Management (VLAN 10)1500Standard, compatibility
VMs with MTU 9000:
├─ Every VM must configure MTU 9000
├─ Windows Default = 1500
├─ Linux Default = 1500
├─ Error-prone (forgotten config)
└─ Complexity increases
VMs with MTU 1500:
├─ Works out-of-the-box ✅
├─ No VM config needed
├─ Internet-compatible
└─ Performance difference minimal (<5%)

No negative impact - settings are MAXIMUM values. 1Gbit NICs continue using small buffers.

Positive! ZFS benefits from larger buffers:

  • Faster replication
  • Better VM migration
  • Higher throughput

Very positive! Ceph over 25Gbit NEEDS these settings:

  • OSD-to-OSD traffic (replication)
  • Client-to-OSD (VM reads/writes)
  • Recovery/rebalancing

Minimal impact:

  • Before: ~400KB per connection
  • After: Up to 128MB per connection (if needed)
  • Kernel frees RAM when not needed
  • With 256GB RAM: not a problem

FILE: 07-troubleshooting/performance-issues.md

Section titled “FILE: 07-troubleshooting/performance-issues.md”
Terminal window
cat /proc/net/bonding/bond0
cat /proc/net/bonding/bond1
cat /proc/net/bonding/bond2
Terminal window
ethtool -a nic5
# RX/TX should be "off"
Terminal window
ip link show bond1 | grep mtu
# Should show "mtu 9000" for storage
Terminal window
sysctl net.core.rmem_max
# Should be 134217728
sysctl net.ipv4.tcp_congestion_control
# Should be bbr
Terminal window
ping -M do -s 8972 -c 3 10.20.20.12

Cause: Flow Control enabled

Solution:

Terminal window
ethtool -A nic5 rx off tx off
ethtool -A nic6 rx off tx off

Cause: Switch Jumbo Frames not enabled

Solution: Enable on EACH switch in UniFi:

Devices → Switch → Settings → Advanced → Jumbo Frames ☑

Cause: LACP hashing - single streams use single link

Solution: Use more parallel streams

Terminal window
iperf3 -c 10.20.20.12 -P 100

Cause: Temporary settings not persisted

Solution:

  1. sysctl in /etc/sysctl.d/99-network-tuning.conf
  2. Flow control via post-up ethtool in /etc/network/interfaces