Hello, I have a SyncThing deployment that’s been in a testing for the last few months. Until just the other week, it worked quite well between two specific sites, though there were some issues that remained unresolved with another site.
The use case is for a smaller company with a 3 sites. We have worked with various file syncing tools for years, and SyncThing has shown great promise vs Microsoft’s near abandoned DFSr.
So, there are a few different things that are synced; -certain shares that are fairly small - under 500 GB -other shares that are used to collect data from remote sites -speed for these is critical -the way these work is that data from a share at a specific site (say sites B and C) is synced to a corresponding location on a central server (say at site A) -once fully synced, there is a script that moves data on the central server to another location, which then allows the sync software to delete the original source
All sites are connected via IPSEC VPN tunnels using SonicWall VPN boxes - NSA2650 at site A, TZ500 at sites B and C.
Site A: -fibre 1000/250 -SonicWALL NSA2650
Site B: -fibre 1000/250 (same carrier, same city as site A) -SonicWALL TZ500
Site C: -fibre 300/300 (different carrier, further away) -SonicWALL TZ500
SyncThing, in all cases, runs on Windows Server 2019 VMs (all are server core installations). The machines it runs on are various file servers.
Relays are disabled, IPV6 is disabled everywhere, and the various nodes are configured with the companion internal IP of their partner servers. SyncThing instances all see each other fine, appropriate port is open, nodes talk to each other without issue.
Everything appears to work quite well - except the speed in most cases.
Initially: Site A to site C and vice versa was very slow no matter what - 20-50 mbps
Site A to site B was running at about 1/2 of the speed I’d expect - 80-90 mbps
Site B to site A was running close to line speed - around 240 mbps on a 250 mbps pipe - this is by far the most important partnership (much as we want the others to work, at least for our trial, this one really matters - and it was working really well).
Site A/B pings are 2-3 ms. iperf speeds for site A/B are 230-240 mbps iperf speeds for site A/C are around 120 mbps
As of about 3 weeks ago, the site B/A transfer slowed right down - it now runs at around 30 mbps no matter what.
The sort of data being transferred hasn’t changed - anywhere from 20-500GB jobs, that once fully synced, are deleted within an hour - so the database shouldn’t be growing indefinitely. File counts aren’t especially high - could range from a few hundred to tens of thousands.
VMs all have lots of bandwidth, not short on RAM or CPU resources, nothing is maxing out. Same on firewalls - everything looks good, speeds are just low.
Any thoughts on where to look here?