performance tuning for a 50 Gbps NIC

Specs

I have 10 servers with 50 Gbps NIC, 2Tb RAM, and 128 CPU threads and here are iperf3 result:

[SUM]   0.00-10.00  sec  57.6 GBytes  49.5 Gbits/sec                  receiver

What I tried

Before I started using syncthing, I used to synchronize directories using a nushell script that just ran rsync in parallel, which was close to max speed. I don’t have the exact results, but the time taken to transfer 10 gigabytes of random bytes was roughly as follows:

30sec (rsync) vs. 2m30sec (syncthing).

the file was created using dd if=/dev/random of=test.dat bs=1G count=10.

I have tried changing the following global options as described in the documentation:

globalAnnounceEnabled = false;
relaysEnabled = false;
setLowPriority = false;
databaseTuning = "large";

maxConcurrentIncomingRequestKiB = 1024 * 1024 * 5;
pullerMaxPendingKiB = 1024 * 512;
blockPullOrder = "inOrder";
copyRangeMethod = "copy_file_range";
numConnections = 10;

progressUpdateIntervalS = -1;
maxFolderConcurrency = -1;
fsWatcherEnabled = true;
copiers = 128;
hashers = 128;
scanProgressIntervalS = -1;
weakHashThresholdPct = 101;
maxConcurrentWrites = 64;
disableFsync = true;

setting env vars:

GOMAXPROCS = "128";
GOMEMLIMIT = "100GiB";
GOGC = "200";

The above is nix code, I’m using NixOS, but the latest raw config looks like this:

cfg.xml (14.6 KB)

But nothing works, I can’t get past the 1.5Gbps:

Syncthing was running as systemd service:

[Unit]
After=network.target
Description=Syncthing service

[Service]
Environment="GOGC=200"
Environment="GOMAXPROCS=128"
Environment="GOMEMLIMIT=100GiB"
Environment="LOCALE_ARCHIVE=/nix/store/9dij3pl4dkdxmxhasjw1pa9hzqv4rjlp-glibc-locales-2.39-52/lib/locale/locale-archive"
Environment="PATH=/nix/store/ysqx2xfzygv2rxl7nxnw48276z5ckppn-coreutils-9.5/bin:/nix/store/36rvynxwln7iz0qq3k1v3r1mna8bma8s-findutils-4.9.0/bin:/nix/store/d9xr7s3z0r8rf0ba22q6ilqv68agymdb-gnugrep-3.11/bin:/nix/store/7xwbkzfrs6flyvjyvd23m8r2mlnycinq-gnused-4.9/bin:/nix/store/d9ff8aqv537mlhpinncx6dwc7a5ky6gk-systemd-255.6/bin:/nix/store/ysqx2xfzygv2rxl7nxnw48276z5ckppn-coreutils-9.5/sbin:/nix/store/36rvynxwln7iz0qq3k1v3r1mna8bma8s-findutils-4.9.0/sbin:/nix/store/d9xr7s3z0r8rf0ba22q6ilqv68agymdb-gnugrep-3.11/sbin:/nix/store/7xwbkzfrs6flyvjyvd23m8r2mlnycinq-gnused-4.9/sbin:/nix/store/d9ff8aqv537mlhpinncx6dwc7a5ky6gk-systemd-255.6/sbin"
Environment="STNORESTART=yes"
Environment="STNOUPGRADE=yes"
Environment="TZDIR=/nix/store/jyh52p2cxrjn8r4ywdv2am5pjkj1xcqa-tzdata-2024a/share/zoneinfo"
CapabilityBoundingSet=~CAP_SYS_PTRACE
CapabilityBoundingSet=~CAP_SYS_ADMIN
CapabilityBoundingSet=~CAP_SETGID
CapabilityBoundingSet=~CAP_SETUID
CapabilityBoundingSet=~CAP_SETPCAP
CapabilityBoundingSet=~CAP_SYS_TIME
CapabilityBoundingSet=~CAP_KILL
ExecStart=/nix/store/n45s8j9ngvh1ws8c1m4v404nnw2kjcd0-syncthing-1.27.7/bin/syncthing \
  -no-browser \
  -gui-address=127.0.0.1:8384 \
  -config=/home/user/.config/syncthing \
  -data=/home/user/.config/syncthing \


Group=users
MemoryDenyWriteExecute=true
NoNewPrivileges=true
PrivateDevices=true
PrivateMounts=true
PrivateTmp=true
PrivateUsers=true
ProtectControlGroups=true
ProtectHostname=true
ProtectKernelModules=true
ProtectKernelTunables=true
Restart=on-failure
RestartForceExitStatus=3 4
RestrictNamespaces=true
RestrictRealtime=true
RestrictSUIDSGID=true
SuccessExitStatus=3 4
User=user

[Install]
WantedBy=multi-user.target

The Question

So, my question is: is this even possible? Fully load the bandwidth using syncthing? Relying on this issue seems like it’s not possible at all.

This thread is exactly how I feel right now, but there is no answer.

I also read a few other threads but didn’t find anything interesting:

Did you check what resources you sre constrained by?

I.e., iotop for storage utilisation, top for cpu?

There is quite a lot going on there:

So perhaps running it under a profiler would help. Note that the clients have to do cryptographic verification of the data and might also be part of the bottleneck.

I don’t think the problem could be system resources. 2x64 cores, 2tb ram should be enough. Plus, the same files are copied much faster with rsync. If resources were the problem, rsync would be about as slow, not 5x faster

p.s. CPU utilization is 30% of one core

Rsync and syncthing do different things. One copies files just by copying files, other one uses cryptographic hashes to verify content, maintains and looks up state in a database, etc. You are not comparing apples with apples.

I guess my question was not “do you have a powerful computer”. My question was “did you check iostats, cpu stats, context switch counts, memory paging stats” and all other metrics that might suggest what the bottleneck is.

I generate a single 10GB file, does syncthing split it into parts and hash each one, or only once per file? If only once, shouldn’t the speed be comparable to rsync after the hash is calculated?

And yes, I checked iotop(disk usage), htop(cpu usage), bmon(network bandwidth). None of the above showed values close to what they were when using rsync. What else can I check?

Yes. It does.

I think the point is to stop comparing it to rsync. They do different things.

The bottle neck question is relevant to actually UNDERSTANDING why the performance isn’t as fast as you expect so that the experts here can either provide an explanation or possibly suggest some things to relieve that bottleneck and improve the performance.

And in the end if rsync work better for you, just use it.

Yes, dividing into blocks is a central concept to how Syncthing works: Understanding Synchronization — Syncthing documentation

Maybe, but I’m still curious to understand how we can speed up syncthing

@AudriusButkevicius @mraneri @acolomb oke, I did the profiling according to this guide.

Idle:

syncthing-cpu-linux-amd64-v1.27.7-133211.pprof (3.1 KB)
syncthing-cpu-linux-amd64-v1.27.7-133403.pprof (935 Bytes)
syncthing-heap-linux-amd64-v1.27.7-133524.pprof (142.9 KB)
syncthing-heap-linux-amd64-v1.27.7-133529.pprof (143.6 KB)

During sync:

syncthing-cpu-linux-amd64-v1.27.7-133645.pprof (11.0 KB)
syncthing-cpu-linux-amd64-v1.27.7-133723.pprof (10.4 KB)
syncthing-heap-linux-amd64-v1.27.7-133758.pprof (142.9 KB)
syncthing-heap-linux-amd64-v1.27.7-133803.pprof (143.1 KB)

Looks like all those profiles are from the sending side. And they just confirm that the expected things are going on on the cpu, but not much overall. I doubt profiles from the receiving side will add much.
Also the profile involves encryptedModel - generally quite helpful to mention when you use non-standard settings. Doesn’t seem to be the issue here though, as that should mostly impact cpu.

I don’t think you mentioned what the disk setup is. Especially where the database is located.

1 Like

Both sides have Mellanox MT42822 BlueField-2 (NVMe-over-Fabric) controllers with ZFS and encrypted root. I have 10 identical machines and I’m syncing one of them with the others.