Help is needed because Syncthing is failing any configuration for my case

PNL · May 25, 2024, 8:20pm

Hi Syncthing Support, Greetings from long time (25+ years) OSS/linux user of this amazing software, which is still failing pretty badly for my case(s) as below.

My HW config:

8x Android smartphones (half are old/backups, so usually turned off).
2x mini servers (thin clients) with Debian 11 LTS, one is usually turned off - these were specially built for Syncthing
1x Synology NAS with native (non Docker) app
2x Linux workstations with latest Linux Mint
1x Mac computer with HomeBrew

My software config:

= latest Syncthing on Synology NAS
= latest Android version via F-Droid on Android phones
= latest Linux version (from apt.syncthing.net)
= latest HomeBrew version of Syncthing on Mac

My Old Setup #1 (created 2 years ago, now decomm’ed):

= about 50x one direction folders to deliver files from Android phones to NAS
= Most of NAS folders were Send-Receive to exchange with extra servers and workstations
= Documents, Desktop, Pictures and Applications folders (about 10 in total) were shared between Linux workstations, Mac and Linux servers in Send-Receive mode
= total Shared size was 2 TB of about 1M different files in about 70x Shared folders: pictures, documents and movie files (each <500MB)

PROBLEM with setup #1: due to many bugs/conflicts, this setup was completely unmanageable (tried for 1.5 year). Especially bad were constant sync-conflicts of Document folders, which would re-create itself (see below).

My New Config #2 (created 5 months ago, current, compared with #1):

= all Shares has been changed to either Send-Only or Receive-Only
= number of Shared folders decreased to about 50x
= Size of shared data is minimised to about 200GB, in total about 100K files
= Most often send large files are moved from NAS to one of thin client servers to exclude possible issues with Synology client.

Problems experienced on the new Setup #2:

= multiple Out-of-Sync issues, sometimes with an empty list displayed by GUI
= If a device - phone or Linux server - was offline during a sync from Send-only device, a lot of (random?) actions is required to bring them in sync again. Including manually deleting non-synced files, rebuilding or re-creating DB.

The WORST issue (still able to recreate!) is inability to delete some sync conflict files on Send-receive (#1) or Receive-Only (#2) folders, where these files would just be constantly re-created from somewhere (an undocumented bug).

My primary question is: Do I have any chance to get Syncthing working for my setup (preferably #1, but at least #2)?

Because I was able to observe in last 2 years all kind of buggy behavior from both Android and Linux clients. And, indeed, Syncthing so far costed me quite a bit of time in investigation of every Out-of-Sync issue, much more than Mac and Linux support of my “small” home environment combined

Now I could only come up with 2x possible causes:

a) there are some (secret?) stable versions of Syncthing per each platform, which might be able to perform rather stable syncs with minimal conflicts/issues. All other (especially latest) versions should be avoided.
b) Syncthing just is not able to work for the scope I’m trying to use it for. For instance, it could only do maximum 10K files

Regards, P.

PNL · May 25, 2024, 8:45pm

Just to be sure, the Linux and config settings:

# per folder => all enabled</b>
syncthing@pih6:~$ grep fsWatcherEnabled .config/syncthing/config.xml |grep -v 'fsWatcherEnabled="true'
syncthing@pih6:~$

<b># all ownership/xattr syncs are disabled</b>
syncthing@pih6:~$ egrep 'syncOwnership|syncXattrs' .config/syncthing/config.xml |egrep -v '>false<'
syncthing@pih6:~$

<b># all defaults</b>
syncthing@pih6:~$ egrep 'copiers|hashers' .config/syncthing/config.xml |egrep -v '>0<'
syncthing@pih6:~$

**# default nbr of connections**
syncthing@pih6:~$ egrep 'numConnections' .config/syncthing/config.xml |egrep -v '>0<'

**# default is as required**
root@pih6:# sysctl -a|grep inoti
fs.inotify.max_queued_events = 16384
fs.inotify.max_user_instances = 128
fs.inotify.max_user_watches = 204800
user.max_inotify_instances = 128
**user.max_inotify_watches = 204800**

gadget · May 25, 2024, 9:06pm

Does that “70x” include subdirectories or is that just a count of the top-level folders?

How are those 2TB of 1M files allocated across the various devices? – i.e. is the entire thing to be synced to all non-Android devices?

ProactiveServices · May 25, 2024, 11:34pm

For anyone to be able to help you, you’ll need to provide the simplest example of a problem, include syncthing version numbers and specific configs, any changes you’ve made from the defaults, log entries and screenshots from relevant devices.

PNL · May 26, 2024, 7:33am

Hi Adam, thank you for your reaction.

Eventually I was not clear enough: this is a design issue, not a single technical issue. On my count, I’ve experienced several different small issues, because recovery procedures were different every time.

So, if you understand me better now, could you kindly have a quick look on my current Setup #2 and come with idea(s) of what is wrong in such use of Syncthing.

PS: When Syncthing is expected to do what I want, then we could plan time to chase every small issue, because it requires creating a very good and small test case per each issue.

PNL · May 26, 2024, 7:43am

Hi gadget,

Does that “70x” include subdirectories or is that just a count of the top-level folders?

This was, likely, my mistake with Setup #1, because I’ve originally trusted the solution so much and wanted to use it also for data replication to offline storage (encrypted USB drives).

Meaning, I’ve tried to sync (i.e. backup) the complete 20y+ household archive from Synology NAS using Syncthing. Where 70x was the counter of Shared “top-level” folders, indeed.

Since I understood my mistake as switched to faster and more reliable rsync tool for such a task.

How are those 2TB of 1M files allocated across the various devices

There were some several huge Movies and disk images folders, but the rest are folders with many files (like Documents and Pictures).

In Setup #2 only some active folders left, this explains 10 fold decrease of size.

ProactiveServices · May 26, 2024, 1:12pm

You’ve given an outline of how you’re using syncthing and brief high-level mentions of some problems. There doesn’t seem to be anything unusual or exotic, although having 70 folder shares is perhaps a lot 10,000 files is certainly not a lot.

None of what you’ve posted points to a design flaw with syncthing. It seems you’re having problems frequently, so the next time a problem arises collect some more information about it and perhaps someone can help. Ideally, making no changes until after someone’s had a chance to look at your details. There’s no need to suffer for years before asking for help.

I am not sure why the project would have secret stable versions that work, and give out the broken ones to the “public”. Latest stable releases tend to be better than previous stable releases.

mraneri · May 26, 2024, 4:39pm

I’m also struggling to see where anything you said is indicating a design problem. These sound like bugs as things that should be working as the documentation states do not.

I wonder if there’s some issue with the databases and whether resetting the databases and having all the machines rescan everything would clear up some of this.

I’m not sure the best way or the precautions required though…

Why do you insist this is a design flaw?

calmh · May 26, 2024, 4:58pm

On top of everything else, be aware that Synology and Android both have nonstandard file system implementations (as compared to a regular Unix). There’s a whole lot of wonkiness on those platforms due to this.

gadget · May 26, 2024, 5:39pm

Given what’s known so far, both #1 and #2 are doable.

Based on the shell prompts from your follow-up post where it says “syncthing@pih6”, plus the fact that you’re running Debian 11, I’m guessing that the mini servers are Raspberry Pi boxes or another similar type of SBC.

If so, then most of the storage might be on a USB drive. Depending on the exact hardware specifications, the filesystem(s), number of files, etc., additional tweaking might be required for Syncthing to run well and to minimize sync errors and/or conflicting files.

With one of the mini servers and half of the Android phones usually offline, there are more chances for file conflicts if you’re syncing files for apps that autostart when the device boots up (resulting in the same file being changed on two or more devices). Just a little extra care will help minimize or eliminate conflicts.

While the bird’s eye overview and output from grep, egrep and sysctl are useful, in order to really pin down the source of the sync errors and file conflicts, exact details about the various connections and Syncthing configurations are required.

I’m not currently using Syncthing for the volume of data you are, but my work setup should provide a general idea of what’s possible:

Over a dozen devices (some offline for extended periods of time) connected via Syncthing in a 1-to-1, 1-to-many, or mesh topology.
Devices are spread across multiple data centers and some network links involve point-to-point VPN tunnels.
Devices run various Linux and Windows versions of Syncthing.
Largest Syncthing folder presently contains over 1 million files and counting.
- The folder is also shared via SMB by a number of Windows workstations for roaming user profiles.
Files vary in size from less than 100 bytes to tens of gigabytes.
To date, there has never been a file conflict and only a rare “local additions” warning that’s expected due to the way a few of the servers are set up.

Although I could’ve also used Syncthing to push updates from a NAS to long-term storage (> 50 million files), it doesn’t require bidirectional or continuous sync, so rsync was a better fit (similar to your backups to encrypted USB drives).

I also don’t use Syncthing for backups because there are better purpose-built tools.

My personal setup includes a NAS, multiple desktops/laptops, and multiple Android devices (some of which are powered off for months at a time). File conflicts seldom occur, and sync errors are even rarer.

PNL · May 26, 2024, 8:16pm

Hi gadget,

Indeed, your setup is a bit similar by size. I’m really surprised you experienced not so many issues.

I’m using thin clients - Fujitsu FUTRO S920 and S720 - which are old but still good by price/performance and reasonable by power consumption. But you were almost right - some of them are using USB3->NVME enclosures and some just SATA SSD drives.

Indeed, I would search for additional USB tuning, although it is just the same mounted [encrypted] filesystem in Linux.

Out of Syncthing FAQ there was a recommendation to disable fsWatcher, but reason was to minimise CPU/RAM, which I think I have enough (RAM >= 4GB).

Would it make any difference from the point of having less issues if I disable fsWatcher completely and use a reasonable value for rescanIntervalS instead?

gadget · May 26, 2024, 10:57pm

The AMD APU in the Fujitsu FUTRO S720 and S920 (I’ve used systems with the same G-series APU), might not have enough horsepower to encrypt the filesystem and handle all of the I/O that Syncthing demands at the same time. A USB connection would definitely add to the overhead.

The number of files and/or directories being watched does impact RAM usage, but there would have to be a lot of files/directories (each watch requires 1KB of RAM), or a very low amount of available RAM for it to become an issue.

In and of itself, filesystem notification has negligible impact on CPU load because the OS already has to track filesystem events.

It helps to disable fsWatcher if the Syncthing folder experiences a lot of frequent updates combined with a lot of files and/or the storage volume is slow due to the hardware specs, filesystem, and/or encryption. Each time Syncthing is notified of a filesystem event it counts down (default 10 seconds) before triggering a scan. If a scan takes a long time it might not be done before the next change event arrives. So relying only on rescanIntervalS help spread out the load in exchange for less efficiency.

So the operating system, the filesystem choice, the number of files, file sizes, hardware selection, etc. all play a part.

PNL · May 27, 2024, 5:22pm

I also have S930, which is more powerful. When there would be a good and reproducable test case, I could try to change the applience - just by moving MSATA and storage.

Indeed, during the full re-scan of all folders it is not fast (i.e. CPU is at 100%), but it usually takes just several minutes even on lowest CPU (S720), which looks OK to me.

Unless, of course, it is breaking the Syncthing application (not that likely for software on Linux to my experience). And Syncthing is the primary application on these thin clients, not much else is running (minimal Debian server install).

As per encryption: I only see a small delay on initial mount, but the actual usage of encrypted FS is not that slow considering obsolete SATA2 bus. May be Veracrypt is coded good or these AMD CPUs are not that bad extended CPU instructions wise.

PS: the initial storage encryption took, indeed, long time, but, of course, I did it on a more powerful workstation

gadget · May 27, 2024, 5:43pm

Depending on the number of files and directories, several minutes sounds reasonable.

Interesting… is it a container, a partition or the entire disk?

When Syncthing is updating files on the mini server, it’s inside the encrypted volume and not syncing the volume itself, right?

PNL · May 27, 2024, 6:08pm

Of course

PNL · May 27, 2024, 6:09pm

whole SSD drive, easier

PNL · May 27, 2024, 6:12pm

What is nice, that:

no password stored on the device
both data and application drives could be re-inserted to any of these thin clients, in case one dies
got PCIe Intel NICs, so network traffic is offloaded from CPU to NIC
S720 could be just several USD/EUR per one if bought many (from Germany EBay)
Syncthing is so light and quick to install.

PS: If I get it working as I like, could recommend it to other privacy-sensitive people, especially as Discovery/Relay servers could be configured within network too.

PNL · May 27, 2024, 7:12pm

I think I’m getting better in resolving Out-of-sync issues

The key was to make all syncs unidirectional (even when one-to-many).

In such situation where is always a clear single “master” for every shared folder, so it is very easy to remove-restart-add same folder on “slaves”, even if the whole folder should be not just re-created but fully re-downloaded.

My previous troubles, seems, were related not just to long offline nodes, but also that these nodes were Send-Receive ones (just re-synced/reconfigured one such within 2hrs). Because this made resolving Out-of-sync issues much more complicated and time consuming.

PNL · May 27, 2024, 7:23pm

One issue I need to work on documenting is as following:

three nodes: single “master”(Send-Only), one is “slave” (Receive-Only) and one is “normal” (Send-Receive).
despite being “slave”, that node re-synced all files from “normal” node, even they were missing on “master”
so “slave” got the Out-of-sync with “master”, even after “normal” node was repaired and migrated to “slave” including full creation of that folder (i.e. DB repair procedure)

In could be an usual configuration for some people (when everything is Send-Receive), but recovery, when something goes unexpected, is really complicated, because requires many actions.

PNL · June 11, 2024, 7:58pm

Update on the subject: till now (15days) could not reproduce any of severe corruption issues. It looks like that a post to this forum magically helped

PS: Seriously speaking: it looks that I’ve found a correct configuration for my case, so, after resolving all previous sync issues, the configuration is so correct, that no new issues are coming.