Slow LAN performance transferring a single 300GB file

I have two Windows systems setup with ST 0.11.3 and they both are setup to sync one directory. On the first Windows system I placed a 300 GB file. The system is equipped with 24 GB RAM and an Intel Core i7-4790 CPU.

The (I guess) initial scan uses very few RAM and causes a plus between 15-20% on the CPU load. The source file is being read with about 190 MB/s speed. So far, all is good.

Now, once the initial scan is over, syncthing.exe reads and transfers the file with only 1MB/s. A direct file copy via the LAN (Gbit) is maxing out either SDD/HDD or LAN speed, using the same two windows systems. Also, at the same time the slow transfer starts, syncthing.exe maxes out the i7 CPU and takes around 9-10GB RAM.

What is going wrong here? Obviously a transfer speed of 1 MB/s is too slow and the high resource use is too high for the slow speed.

I know this is Windows, but can you do some command line fu?

C:\> SET GODEBUG=gctrace=1
C:\> syncthing -verbose 1> syncthing.log 2>&1

Post syncthing.log somewhere.

I’m guessing something is badly optimized and churning memory… We keep the full block list for the file being transferred in RAM, which in this case would be about 30010241024/128*40 =~ 94 MiB (one block structure per 128 KiB of file, about 40 bytes per struct) maybe times two for overhead. But still some ways away from 9 GiB.

If that goes fine, the following grabs even more information (but will cause the sync to go very slowly indeed, because the profiling we enable here takes time):

C:\> SET STHEAPPROFILE=1
C:\> syncthing -verbose > syncthing.log

Let it run until memory usage reaches annoying levels. Post heap-<some-number>.pprof which will be created and syncthing.log.

Thanks for the instructions. Fails on my Windows System. I’ll redo the setup from scratch on two lab systems and play with the debug options.

Setup two new Windows 7 64Bit Systems with the latest 0.11.3 and a single share.

Exactly the same result. During the initial scan of the 300 GB test file, all is fine, small CPU % increase and small RAM footprint. Once the scan is complete and the transfer starts, CPU goes 100% and RAM usage is very high.

What is the best way to create logging?

For logging, see above. You say that this “fails”, but not how so I can’t provide any other suggestions…

Figured out the fail. Using SyncTrayzor this comes up with a blanc Syncthing missing all config. Redid with plain Syncthing an produced the log. Link to the log via PM.

New test with a real system setup and a lab setup using the latest 0.11.6 version of Syncthing. Same result as above. Very high CPU and RAM values and slow LAN performance (like 1% of LAN speed).

Is there anyone else syncing files of that size?

Probably someone, but not many. According to https://data.syncthing.net, 95% of the population have less than 264 GB synced per device, spread over 135k files. So this is definitely an outlier. I got the log from you, but it didn’t contain any GC trace information and no heap profile…

You don’t say whether the sync that’s ongoing now is the initial one or a subsequent one. I’m guessing not the initial, since it’s been a while. In that case we don’t really expect to max out the network - what happens is that the destination copies blocks that it already has, while the differing ones are transferred. But even just copying and hashing 300 GB on the destination is quite a lot of work, when the file updates. While this happens, not a lot is sent over the network, since there is nothing to send.

You may want something closer to rsync --inplace rather than syncthing, as syncthing always operates on a copy of the file.

I did some more testing. Creating new files and testing with one file each, at 100MB, 1GB, 10GB, 50GB I see a noticable increase in CPU and RAM values. Already at 50GB file size the CPU is maxed out by Syncthing. Also, the bigger the file, the slower the LAN transfer speed, there is definetly something wrong here.

The testing was done on two idle systems, one single share, one file at a time. This should be easy to reproduce. How can I help?

Are you talking about initial syncs here? Random data in the files?

Initial scan works perfect. CPU about 25% and low RAM footprint, exactly with values as you calculated above.

Once the scan is done and the transfer starts, that is what I mean where the problem starts.

To give everyone an idea: Copy of a file over the LAN to the target maxes out my Gbit LAN at about 110MB/s. So no issues there.

Syncthing values, putting a random generated file into the share, conditions as in my last post:

100MB file transfer is instant. 1GB file transfer also nearly instant. 5GB file, CPU 30%, RAM 100MB, transferspeed 30MB/s 10GB file, CPU 30%, RAM 164MB, transferspeed 15MB/s 20GB file, CPU 92%, RAM 590MB, transferspeed 10MB/s 30GB file, CPU 92%, RAM 950MB, transferspeed 10MB/s

Values CPU and RAM from Task-Manager looking at Syncthing.exe

This can be reproduced.

I’ll set up a VM to do some testing, don’t have local disk space enough for it to be relevant. :slight_smile:

Yeah, confirmed. The benchmarking so far mostly measured the receiving side (since that’s where we can tell when we’re done etc) while the resource problem here is on the sending side. Wonder why, will look into it.

1 Like

OK, I know what the problem is, there’s a pretty serious performance problem on the sending side for large files. I’ve got a preliminary fix, you can try the build in http://build.syncthing.net/job/syncthing-pr/594/ to try it out. There’s still a memory hit for large files that we should work on, but as long as that “fits” on the system, the actual sync performance should be OK for large files now I think…

3 Likes

You are a star!

Testing with a 30GB file, not only is Syncthing.exe using 10% CPU, also the transfer speed got much faster with 65-70MB/s.

This fix solved two problems and makes the transfer a lot faster.

Thanks for looking into this, finding the problem and providing the fix in this short amount of time.

2 Likes

Hi

I came across this post while looking for slow binary transfers. Had this issue been resolved and fixed in the code?

I am syncing binary file min 100mb+ and my transfer speeds are so slow and it happens in burst but still very very slow. It should not take half an hour+ to send a 100mb file on my network ;(

I am using v0.14.19, Linux (64 bit) + Win7 x64

Check CPU/memory usage, and network throughput.

Sending syncthing(still sending) does not use any cpu (less than 1 percent) and memory usage is max couple hundred meg as far as I can tell. I have 32gb on the sending side and 12gb on the receiving side.

Btw I my system drive is ssd and the binary is on an ssd.

I have this issue with other devices as well.

Well CPU/memory usage on both sides matters, and generic network throughput matters too.

Understandable however my network speed is not less than 10mb/s and these devices are connected via multiple interfaces (one lan, one wifi)

One device is a quad core +12gb the other is 6cores+32gb

here is one perf on one int

[ ID] Interval Transfer Bandwidth [ 12] 0.0-10.1 sec 1.75 MBytes 1.46 Mbits/sec [ 4] 0.0-10.1 sec 2.38 MBytes 1.98 Mbits/sec [ 8] 0.0-10.3 sec 2.00 MBytes 1.63 Mbits/sec [ 7] 0.0-10.3 sec 2.38 MBytes 1.93 Mbits/sec [ 5] 0.0-10.4 sec 2.50 MBytes 2.03 Mbits/sec [ 10] 0.0-10.5 sec 1.75 MBytes 1.39 Mbits/sec [ 11] 0.0-10.7 sec 2.12 MBytes 1.67 Mbits/sec [ 6] 0.0-10.7 sec 2.50 MBytes 1.96 Mbits/sec [ 9] 0.0-10.7 sec 2.12 MBytes 1.66 Mbits/sec [ 3] 0.0-11.4 sec 2.62 MBytes 1.94 Mbits/sec [SUM] 0.0-11.4 sec 22.1 MBytes 16.3 Mbits/sec

Wifi [ 6] 0.0-10.1 sec 1.38 MBytes 1.14 Mbits/sec [ 3] 0.0-10.1 sec 1.38 MBytes 1.14 Mbits/sec [ 7] 0.0-10.1 sec 1.38 MBytes 1.14 Mbits/sec [ 4] 0.0-10.2 sec 1.38 MBytes 1.13 Mbits/sec [ 9] 0.0-10.2 sec 1.38 MBytes 1.13 Mbits/sec [ 8] 0.0-10.2 sec 1.38 MBytes 1.13 Mbits/sec [ 11] 0.0-10.2 sec 1.38 MBytes 1.13 Mbits/sec [ 10] 0.0-10.2 sec 1.38 MBytes 1.13 Mbits/sec [ 5] 0.0-10.2 sec 1.38 MBytes 1.13 Mbits/sec [ 12] 0.0-10.3 sec 1.38 MBytes 1.12 Mbits/sec [SUM] 0.0-10.3 sec 13.8 MBytes 11.2 Mbits/sec