Test standard MTU (1500 bytes)
Complete network performance optimization for the Proxmox cluster.
Problem: Slow Network Performance
Section titled “Problem: Slow Network Performance”Initial Symptoms
Section titled “Initial Symptoms”- TCP Speed: ~2.5 Mbit/s (should be 25+ Gbit/s)
- UDP Speed: ~14 Gbit/s (working)
- Jumbo Frames not passing
Root Causes Found
Section titled “Root Causes Found”- Flow Control enabled (RX/TX ON) - killing TCP performance
- TCP Buffer sizes too small (default ~212KB instead of 128MB)
- Jumbo Frames not enabled on switches (only on edge firewall)
- Default TCP congestion control (cubic instead of BBR)
Jumbo Frames Troubleshooting
Section titled “Jumbo Frames Troubleshooting”MTU Test Commands
Section titled “MTU Test Commands”# Test standard MTU (1500 bytes)ping -M do -s 1472 -c 3 10.20.20.12# -M do = Don't fragment# -s 1472 = Payload (1472 + 28 header = 1500)# Should work ✓
# Test medium MTU (4000 bytes)ping -M do -s 3972 -c 3 10.20.20.12# 3972 + 28 = 4000
# Test large MTU (8000 bytes)ping -M do -s 7972 -c 3 10.20.20.12# 7972 + 28 = 8000
# Test Jumbo Frame (9000 bytes)ping -M do -s 8972 -c 3 10.20.20.12# 8972 + 28 = 9000Solution: Enable Jumbo Frames on Switches
Section titled “Solution: Enable Jumbo Frames on Switches”UniFi requires Jumbo Frames enabled on EACH switch individually:
Devices → sw-core-zrh-01 → Settings└─ Advanced → Jumbo Frames ☑
Devices → sw-dist-zrh-01 → Settings└─ Advanced → Jumbo Frames ☑Important: Enabling on fw-edge-zrh-01 only is NOT enough!
Successful MTU Test
Section titled “Successful MTU Test”root@srv-pve-zrh-01:~# ping -M do -s 8972 -c 3 10.20.20.12PING 10.20.20.12 (10.20.20.12) 8972(9000) bytes of data.8980 bytes from 10.20.20.12: icmp_seq=1 ttl=64 time=0.245 ms8980 bytes from 10.20.20.12: icmp_seq=2 ttl=64 time=0.237 ms8980 bytes from 10.20.20.12: icmp_seq=3 ttl=64 time=0.239 msFlow Control Issue
Section titled “Flow Control Issue”Detection
Section titled “Detection”ethtool -a nic5# Output showing problem:# Pause parameters for nic5:# Autonegotiate: off# RX: on ← PROBLEM!# TX: on ← PROBLEM!Solution
Section titled “Solution”# Disable Flow Control on ALL NICsethtool -A nic1 rx off tx offethtool -A nic2 rx off tx offethtool -A nic3 rx off tx offethtool -A nic4 rx off tx offethtool -A nic5 rx off tx offethtool -A nic6 rx off tx offWhy Flow Control OFF?
Section titled “Why Flow Control OFF?”Flow Control ON:├─ Causes Buffer-Bloat├─ Adds latency├─ Reduces throughput└─ Bad for modern high-speed networks
Flow Control OFF:├─ Better latency├─ Higher throughput├─ Enterprise Best Practice└─ Required for 10G+ networksTCP Buffer Tuning
Section titled “TCP Buffer Tuning”Default Values (too small)
Section titled “Default Values (too small)”net.core.rmem_max = ~212KBnet.core.wmem_max = ~212KBtcp_congestion_control = cubicOptimized Values
Section titled “Optimized Values”Create /etc/sysctl.d/99-network-tuning.conf:
# High-Speed Network Tuning for 25Gbit Storage# Applied to: srv-pve-zrh-01, srv-pve-zrh-02
# TCP Buffer Sizes (128 MB max)net.core.rmem_max = 134217728net.core.wmem_max = 134217728
# TCP Auto-Tuning (min, default, max in bytes)net.ipv4.tcp_rmem = 4096 87380 67108864net.ipv4.tcp_wmem = 4096 65536 67108864
# Network Backlog (for many simultaneous connections)net.core.netdev_max_backlog = 50000
# TCP Congestion Control (Google BBR)net.ipv4.tcp_congestion_control = bbrnet.core.default_qdisc = fq
# MTU Probing (finds optimal packet size)net.ipv4.tcp_mtu_probing = 1
# TCP Optionsnet.ipv4.tcp_window_scaling = 1net.ipv4.tcp_timestamps = 1net.ipv4.tcp_sack = 1Activate
Section titled “Activate”sysctl -p /etc/sysctl.d/99-network-tuning.confVerify
Section titled “Verify”sysctl net.core.rmem_max# Expected: 134217728
sysctl net.ipv4.tcp_congestion_control# Expected: bbrPersistent Configuration
Section titled “Persistent Configuration”/etc/network/interfaces (excerpt)
Section titled “/etc/network/interfaces (excerpt)”# Bond 1 (Storage 25G)auto bond1iface bond1 inet static address 10.20.20.11/28 bond-slaves nic5 nic6 bond-mode 802.3ad bond-miimon 100 bond-xmit-hash-policy layer3+4 bond-lacp-rate fast mtu 9000 # Disable Flow Control post-up ethtool -A nic5 rx off tx off post-up ethtool -A nic6 rx off tx offAdd similar post-up lines for all bonds.
Performance Test Results
Section titled “Performance Test Results”Before Tuning
Section titled “Before Tuning”TCP (8 streams): 2.5 Mbit/s ❌UDP (single): 14 Gbit/sJumbo Frames: FailedAfter Tuning
Section titled “After Tuning”TCP (8 streams): 24.8 Gbit/s ✅TCP (100 streams): 48.7 Gbit/s ✅UDP (single): 14 Gbit/sJumbo Frames: 0.24ms latency ✅Test Commands
Section titled “Test Commands”# Start server on Node 02iperf3 -s -B 10.20.20.12
# TCP Test (8 parallel streams)iperf3 -c 10.20.20.12 -t 10 -P 8
# TCP Test (100 streams for max throughput)iperf3 -c 10.20.20.12 -t 10 -P 100
# UDP Testiperf3 -c 10.20.20.12 -u -b 50G -t 10LACP Load Balancing Explained
Section titled “LACP Load Balancing Explained”Why 25 Gbit/s instead of 50 Gbit/s with 8 streams?
Section titled “Why 25 Gbit/s instead of 50 Gbit/s with 8 streams?”LACP Load-Balancing:├─ Each TCP stream uses ONLY 1 link!├─ 8 parallel streams distribute:│ ├─ 4-5 streams on Link 1 (nic5)│ └─ 3-4 streams on Link 2 (nic6)└─ Result: ~25 Gbit/s total
To reach 50 Gbit/s:└─ Need MANY parallel connections (e.g., Ceph with 100+ streams)Real-World Performance
Section titled “Real-World Performance”Single VM Migration: ~20-25 Gbit/sCeph (many OSDs): ~45-50 Gbit/s ✅ZFS Replication: ~20-25 Gbit/sMTU Recommendations
Section titled “MTU Recommendations”| Network | MTU | Reason |
|---|---|---|
| Storage (VLAN 20) | 9000 | Maximum performance for Ceph/Storage |
| VM Traffic (VLAN 30+) | 1500 | VMs use default, simpler management |
| Management (VLAN 10) | 1500 | Standard, compatibility |
Why NOT Jumbo Frames for VMs?
Section titled “Why NOT Jumbo Frames for VMs?”VMs with MTU 9000:├─ Every VM must configure MTU 9000├─ Windows Default = 1500├─ Linux Default = 1500├─ Error-prone (forgotten config)└─ Complexity increases
VMs with MTU 1500:├─ Works out-of-the-box ✅├─ No VM config needed├─ Internet-compatible└─ Performance difference minimal (<5%)Impact Analysis
Section titled “Impact Analysis”On 1Gbit NICs
Section titled “On 1Gbit NICs”No negative impact - settings are MAXIMUM values. 1Gbit NICs continue using small buffers.
On ZFS
Section titled “On ZFS”Positive! ZFS benefits from larger buffers:
- Faster replication
- Better VM migration
- Higher throughput
On Ceph (planned)
Section titled “On Ceph (planned)”Very positive! Ceph over 25Gbit NEEDS these settings:
- OSD-to-OSD traffic (replication)
- Client-to-OSD (VM reads/writes)
- Recovery/rebalancing
On RAM
Section titled “On RAM”Minimal impact:
- Before: ~400KB per connection
- After: Up to 128MB per connection (if needed)
- Kernel frees RAM when not needed
- With 256GB RAM: not a problem
FILE: 07-troubleshooting/performance-issues.md
Section titled “FILE: 07-troubleshooting/performance-issues.md”Network Performance Issues
Section titled “Network Performance Issues”Quick Diagnostics
Section titled “Quick Diagnostics”Check Bond Status
Section titled “Check Bond Status”cat /proc/net/bonding/bond0cat /proc/net/bonding/bond1cat /proc/net/bonding/bond2Check Flow Control
Section titled “Check Flow Control”ethtool -a nic5# RX/TX should be "off"Check MTU
Section titled “Check MTU”ip link show bond1 | grep mtu# Should show "mtu 9000" for storageCheck sysctl Settings
Section titled “Check sysctl Settings”sysctl net.core.rmem_max# Should be 134217728
sysctl net.ipv4.tcp_congestion_control# Should be bbrMTU Path Test
Section titled “MTU Path Test”ping -M do -s 8972 -c 3 10.20.20.12Common Problems
Section titled “Common Problems”Problem: TCP extremely slow, UDP works
Section titled “Problem: TCP extremely slow, UDP works”Cause: Flow Control enabled
Solution:
ethtool -A nic5 rx off tx offethtool -A nic6 rx off tx offProblem: Jumbo frames not working
Section titled “Problem: Jumbo frames not working”Cause: Switch Jumbo Frames not enabled
Solution: Enable on EACH switch in UniFi:
Devices → Switch → Settings → Advanced → Jumbo Frames ☑Problem: Only 25Gbit instead of 50Gbit
Section titled “Problem: Only 25Gbit instead of 50Gbit”Cause: LACP hashing - single streams use single link
Solution: Use more parallel streams
iperf3 -c 10.20.20.12 -P 100Problem: Settings lost after reboot
Section titled “Problem: Settings lost after reboot”Cause: Temporary settings not persisted
Solution:
- sysctl in
/etc/sysctl.d/99-network-tuning.conf - Flow control via
post-up ethtoolin/etc/network/interfaces