Why does periodically restarting Syncthing dramatically improve overall transfer speed?

We don’t use them, no, so they’re set at the default 0 setting. Compiling isn’t something I’m set up to do quickly unfortunately, so I’ll need some time on that. First though I can try setting them temporarily and seeing if it honors them properly over time as well. I’ll do that now and set them for 25MB/s and fire up a few more big files.

I’m early in my test, but rate limiters might be a good place to start looking into.

I’m recording another long test now and the initial behavior on the network is entirely different with a rate limit in place. I’ve got things locked down to 12500KiB/s on the receiving limit and the network activity has been rock solid at that line for nearly 40 minutes. Frankly I’ve never seen a cleaner, more steady connection from syncthing in almost 2 years of using it extensively. I’ll have more data to share on this after this data collector is done in the morning, and I plan on increasing the limit a few times during the test to see if this can maintain its consistency as the cap increases.

Here’s a quick look at how clean this thing is right now: image

2 Likes

If this turns out to be a thing, I wonder if this might also explain the subpar performance of quic: Improve QUIC performance · Issue #7636 · syncthing/syncthing · GitHub

There was a bug with that made quic slow that should now be fixed, so not sure how related that would be.

The fact that it’s over time and on tcp suggests its something else.

The bug which sparked the ticket is fixed but quic is still a lot slower than tcp. The protocol itself should be able to saturate a gigabit link. On my local wifi the difference is 600 vs 250Mbit/s.

Edit: quic-go uses a quite simple congestion control algorithm which might explain the difference. My point is that OP might be on to something and the quic performance could also be a symptom of that.

Quic will likely always be slower than tcp because the syscall interface towards the kernel is vastly less efficient for udp than for tcp, when calling from Go.

New test with limits in place at 12.5MB/s. I started with a 12500KiB/s limit on receive and it was rock solid through a huge chunk of a 47GB file, unfortunately it looks like it falls back into the 1-4MB/s pattern after a little over an hour. Interestingly it looks like it kicked back up to the 12.5MiB/s limit at around 5:15a, which was not a manual restart, so it looks as if it happened after a disconnection/reconnection.

Seems as if the forums/something has an issue uploading screenshots at the moment, so I’ll try and post the screen grab of this a bit later.

Edit: imgur Link to test showing network bytes received at 5 second interval on receiving client. https://imgur.com/a/SpSgm2m

Any chance a router/ISP QoS thing is interfering? (don’t really have a good idea how to detect that).

Pretty unlikely, these are residential type connections and most of these tests I’m running are overnight at both locations, so general congestion should be at a minimum. Also the restart is still a pretty reliable way of boosting connection speed for awhile, even if it’d been running at 1-4MB/s for the last 2 hours, a restart will almost always push us well above 20MiB/s and often bursts up to our caps, which is the big puzzle.

Could you run a continuous iperf test e.g on port 22001 for a few hours? This should be enough to check if this is purely network related.

I think it should be enough to use tool like winscp (of any other sftp-client) to transport multi-terabyte file over network and check its transfer rate.

That’s actually not that true anymore, atleast on linux.

Quic uses recvmmsg (recvmmsg(2) - Linux man page) which receives multiple udp packets in a single syscall.

1 Like

Oh. Nice.

As for the issue. Iperf sounds like a good start. If that test doesn’t show anything, you could try a custom build with rate limiters ripped out.

That, or I’d still like to see a graph of network traffic, CPU usage, RAM usage, I/O usage, simultaneously. For both sides. As it is, there’s some random graph of this or that, but nothing covering the whole thing.

Maybe it’s slipping into some mode with random I/O of small blocks instead of sequential reads on the sending side, for example.

Also, since this is Windows: did you turn off antivirus?

1 Like

The data collector has all that info for the receiving end, I’ve just been trying to keep the info as clean as possible to outline the problem. I’ll need to run another test with a similar collector setup on the source end to get both sides in one timeframe though.

I can also try and sneak in a longterm iperf tonight as well.

Regarding AV, there’s no dedicated AV installed on the system, however Defender is running on the default config on both sides with one exception: I added the destination drive on the receiving unit to the defender exclusion list for testing and it’s still there.

The restart/disconnect behaviour is why I was suspecting throttling of some sort: ISPs are known to do the most random things, something like “hey this connection is running since a long time at a high speed, this needs interfering” would be bad but not entirely out of the question :stuck_out_tongue:

Having worked at a company implementing precisely those kinds of policies for ISPs, I gotta say that if that’s what’s happening then they’re using a sucky throttling solution. I would have deployed something a lot smoother and less obvious. :stuck_out_tongue:

1 Like

3 posts were split to a new topic: Consistently lower transfer rates than expected

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.