Incomprehensible behavior in a large installation

Hello! There is a Syncthing setup with 73 PCs and 1 server. 1063 folders in total, 68 TB of data. Communication channels from the PC are different, from 60 to 200 Mbps. The server has 1 Gbps. From the PC, they see only the server and send data only to the server. The server sees everyone. Weekly change in data ~ 18 TB. All OK.

We decided to add 1 more server to this network. A new server was added to all PCs. The data transfer rate between server # 1 and server # 2 is 400 Mbps. If I disable data transfer between servers (pause), the synchronization speed is 40-60 MB / s with each server. When I enable data transfer between servers (unpause) - the synchronization speed on server # 2 drops to 10-20 MB / s after a few minutes.

I think this is strange behavior. Clarify please.

Syncthing version 1.16.1

To get any useful attempt at clarification, you’ll need to provide a lot more context. Between which devices are those transfer speeds. What’s the resource usage like.

And purely out of curiosity: Why don’t you link the PCs at all?

Hello! Servers: Windows Server 2012 R2. CPU 16 Core Xeon E5-2697v4, RAM 16 Gb Storage - RAID60 on spindle disks.

In my work, I regulate the load using the maxFolderConcurrency parameter. I keep track of the disk queue, CPU load and the amount of RAM. I keep the load level no more than 90%.

PCs are also different Windows systems. There is almost no load on them.

Each PC should communicate with servers, but should not communicate with other PCs.

The second server was connected as a backup. It is located far from the first server, respectively, the network has a delay of 30-33ms.

PCs are also located far from each other and from servers, the latency is different, from 1 to 70 ms.

What other information is needed?

I watch the log on server # 2, I see the entry:

[RH4FV] 2021/05/28 14:17:54 INFO: Puller (folder “XXXXXX” (XXXXXX), item “XXXXXXX”): syncing: no connected device has the required version of this file

The file and folder names have been replaced.

This file is present both on the PC and on server # 1, scanning for them is over.

If the new server isn’t yet fully synced (i.e. still on the first scan or sync operation) that will put a large load on server1 too. In which case db access might be the limiting factor and thus explain the lower bandwidth. That log line could indicate that this is the case (still on the first synced, and the file to be synced has already changed on server 1 since the start - will be resolved in a second sync iteratation).

Files on each PC are added regularly, ~ +6 files every 15 minutes, up to 20MB. Larger files appear less often, ~ once a week + 6 files of 20-200Gb each.

= 6 * 73 = 438 small files are added every 15 minutes and the same number of large files are added once a week.

My point is that the folders on server # 1 are regularly in sync.

Judging by what you have written, you should not enable the connection between server # 1 and server # 2. Never.

Do I understand you correctly?

1 Like

No, that’s not what I am saying. And anyone that would give you such absolute instruction couldn’t be correct: With the level of information one can clearly not make any such absolute statement. And also ignoring that pedantry of mine, it’s still wrong: Connection to server 2 isn’t any different than adding one of the devices you call PCs. They are just equally important remote devices to Syncthing. What I suggested was, that if server 2 shares all data with server 1 and is newly added, potentially already had all the data mirrored on disk, the initial process of getting in sync with server 1 can cause a lot of db access and writes to happen on server 1, thus slow it down. You can easily check on server 1, whether that’s the case or not (what’s the sync status).

Everything is correct. When I turn on the connection between server # 1 and server # 2, there is a queue to the disk on which the database is located and the Syncthing configuration (a separate dedicated disk) of server # 1 gradually increases to very high values.

Since these are spindle disks and the data volumes are very high, there is a high probability that the synchronization of server # 1 and server # 2 will never end… That is why I made the previous conclusion.

Have you had a similar successful experience?

Yes, such use cases have been deployed successfully in the past.

Simple piece of advice as you mentioned spindle disk: Having the database on fast storage (SSD) separate from the data is generally recommended, for such large, centralized setups it’s basically required.

Otherwise to find your specific bottlenecks in your setup or Syncthing code, and optimal procedure to reach your aims, you can either dig deep yourself or get paid support for Syncthing.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.