We use Syncthing to provide two way replication between two GlusterFS installations over a WAN link.
This has been working really well for well over a year with each one running Syncthing on a stand alone CentOS 7 VM and the Gluster nodes on 3 stand alone CentOS 7 VMs with a similar setup at the second site (albeit with 3 Syncthings instances due to legacy directory permissions). We sync about 11 folders all up - some with almost no files but others have 100,000+ files with individual directories in those folders sometimes containing 20,000+ small files.
Last week I needed to add an additional Syncthing VM to sync to a third Gluster location.
Our original Syncthing versions were quite old - v1.3.4 to be exact. So I thought I’d take the long overdue opportunity to upgrade Syncthing while installing the new Syncthing copy - and this is where the fun started!
So I upgraded one existing Syncthing VM to 1.12 and started it - the initial scan looked like it would take quite some time (not unusual in this setup) so while that was happening I got the 3rd Syncthing server up and running and connected to both existing copies.
But the upgraded Syncthing server just kept slowly doing the startup scan - so slow in fact that it too 6 days to complete the initial scan for all the folders!
But then it just wouldn’t sync files - getting constantly stuck at the “Scanning” stage.
Lots of debugging later and swearing at Gluster (it has known issues with huge numbers of small files in one directory) and I eventually found the issue - the Case Sensitive FS option was NOT enabled for the upgraded Syncthing copy’s folders.
This was the culprit - enabling this and rescanning everything got things back to minutes for the initial scan instead of days and a working sync between all the Syncthing instances.
So it looks like an upgrade from a Syncthing version that didn’t have the Case Sensitive FS folder option leaves the option disabled when you upgrade to a version with this support.
Add that to directories with 20,000+ small files and throw Gluster into the mix and the performance of the system just dies to the point of almost outright stalling.
I think what was happening is for each file SYncthing was trying to check to do a case insensitive search of the entire directory for a matching file - and this was just too much for our Gluster servers to handle in any sort of reasonable timeframe.
I did some searching but couldn’t find anyone else reporting this scenario so thought I’d mention it here in case someone else strikes this same issue and starts searching for answers.
So in the end a good upgrade and a working 3rd Gluster system in sync with some bonus learning about Go Profiling (it was the CPU Profile Top output that pointed me to the case sensitive FS option).