Never ending sync, multiple devices

I am experiencing a strange issue and I’m failing to track it down, so I hope I can get some hints here. Please have a look at the attached picture, as it describes my actual infrastructure.

Each of the STn machines is an AlmaLinux 9 box running kernel 5.14 and Syncthing 1.27.8 as root and using QUIC4. All shared folders reside on xfs filesystems.

Each of the NGn machines is just like the above but also running nginx. All of them share the following in send-receive bidirectional mode:

  • /usr/share/nginx, which contains the root for the websites;
  • /etc/nginx/conf.d, which contains the configuration files for the virtual hosts. (actually, those folders are in a different path for the machines who don’t run nginx, but I guess it makes no difference)

All is well with /etc/nginx/conf.d, which actually contains only 3 text files. But when I cd into the (empty) subfolder for any of the websites I’m trying to serve, i.e. /usr/share/nginx/website1.com, and run a git clone of my project, as soon as the files have been downloaded, the various machines start exchanging data back and forth endlessly. At some point the process seems to finish correctly, as I see every folder on every machine is reported as synced, but in a matter of seconds the data exchange starts over, lasts a few seconds and completes, then again and again.

When comparing any file that gets synced over and over I notice there’s no change in its contents, but atime and ctime will vary, as you can see with the following two stats I run a few seconds away from each other (please also note that ctime keeps increasing, while atime goes back and forth from a fixed 2024-06-10 11:41:51.078680635 to the latest ctime):

[root@NG1 website1.com]# stat path/to/file.css
  File: path/to/file.css
  Size: 3827            Blocks: 8          IO Block: 4096   regular file
Device: fd00h/64768d    Inode: 934036      Links: 1
Access: (0644/-rw-r--r--)  Uid: ( 1001/ username)   Gid: ( 1001/ groupname)
Access: 2024-06-10 11:41:51.078680635 +0200
Modify: 2024-06-10 11:41:51.078680635 +0200
Change: 2024-06-10 12:46:02.112155733 +0200
 Birth: 2024-06-10 11:41:51.078680635 +0200
[root@NG1 website1.com]# stat path/to/file.css
  File: path/to/file.css
  Size: 3827            Blocks: 8          IO Block: 4096   regular file
Device: fd00h/64768d    Inode: 934036      Links: 1
Access: (0644/-rw-r--r--)  Uid: ( 1001/ username)   Gid: ( 1001/ groupname)
Access: 2024-06-10 12:46:04.481162116 +0200
Modify: 2024-06-10 11:41:51.078680635 +0200
Change: 2024-06-10 12:46:02.112155733 +0200
 Birth: 2024-06-10 11:41:51.078680635 +0200
[root@NG1 website1.com]#

While trying to hunt for which process was making changes I ran the following:

[root@NG1 website1.com]# systemctl stop nginx
[root@NG1 website1.com]# auditctl -a exit,always -F path=/etc/nginx/conf.d/website1.com/path/to/file.css
[root@NG1 website1.com]# ausearch -f /etc/nginx/conf.d/website1.com/path/to/file.css

and got a number of references all pointing to syncthing (no other executable mentioned). So I went:

[root@NG1 website1.com]# ps aux | grep syncthing
root         955  1.3  0.3 1251040 24068 ?       Ssl  11:32   1:38 /usr/bin/syncthing -no-browser -gui-address=0.0.0.0:8384 -no-restart -logflags=11 -logfile=/var/log/syncthing.log -verbose
root         985 18.9  1.9 1389448 154576 ?      SNl  11:32  22:56 /usr/bin/syncthing -no-browser -gui-address=0.0.0.0:8384 -no-restart -logflags=11 -logfile=/var/log/syncthing.log -verbose
root       38323  0.0  0.0   3876  2148 pts/0    S+   13:34   0:00 grep --color=auto syncthing
[root@NG1 website1.com]#
[root@NG1 website1.com]#
[root@NG1 website1.com]#
[root@NG1 website1.com]# strace -f -p 985 -P /etc/nginx/conf.d/website1.com/path/to/file.css
strace: Process 985 attached with 12 threads
[pid   987] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=985, si_uid=0} ---
[pid   988] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=985, si_uid=0} ---
[...]
[pid   989] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=985, si_uid=0} ---
[pid   990] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=985, si_uid=0} ---
[pid   989] newfstatat(AT_FDCWD, "/etc/nginx/conf.d/website1.com/path/to/file.css", {st_mode=S_IFREG|0644, st_size=3827, ...}, AT_SYMLINK_NOFOLLOW) = 0
[pid   989] fchmodat(AT_FDCWD, "/etc/nginx/conf.d/website1.com/path/to/file.css", 0644) = 0
[pid   989] llistxattr("/etc/nginx/conf.d/website1.com/path/to/file.css", "", 1024) = 0
[pid   989] newfstatat(AT_FDCWD, "/etc/nginx/conf.d/website1.com/path/to/file.css", {st_mode=S_IFREG|0644, st_size=3827, ...}, AT_SYMLINK_NOFOLLOW) = 0
[pid   989] fchownat(AT_FDCWD, "/etc/nginx/conf.d/website1.com/path/to/file.css", 1001, 1001, AT_SYMLINK_NOFOLLOW) = 0
[pid   989] newfstatat(AT_FDCWD, "/etc/nginx/conf.d/website1.com/path/to/file.css", {st_mode=S_IFREG|0644, st_size=3827, ...}, AT_SYMLINK_NOFOLLOW) = 0
[pid   989] utimensat(AT_FDCWD, "/etc/nginx/conf.d/website1.com/path/to/file.css", [{tv_sec=1718012511, tv_nsec=78680635} /* 2024-06-10T11:41:51.078680635+0200 */, {tv_sec=1718012511, tv_nsec=78680635} /* 2024-06-10T11:41:51.078680635+0200 */], 0) = 0
[pid   989] newfstatat(AT_FDCWD, "/etc/nginx/conf.d/website1.com/path/to/file.css", {st_mode=S_IFREG|0644, st_size=3827, ...}, AT_SYMLINK_NOFOLLOW) = 0
[pid   990] newfstatat(AT_FDCWD, "/etc/nginx/conf.d/website1.com/path/to/file.css", {st_mode=S_IFREG|0644, st_size=3827, ...}, AT_SYMLINK_NOFOLLOW) = 0
[pid   989] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=985, si_uid=0} ---
[pid   988] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=985, si_uid=0} ---
[...]
[pid   989] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=985, si_uid=0} ---
[pid   990] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=985, si_uid=0} ---
[root@NG1 website1.com]#

So it seems that no other process is accessing that file, and Syncthing is endlessly mangling permissions, xattrs and timestamps, for no apparent reason. The same happens if I turn on “Ignore permissions” and disable ownership and xattrs synchronisation. Also tried to switch to different paths, just in case, to no avail. There are a few more strange thing to keep in mind:

  • if I stop the Syncthing process on the NGn machines, only leaving it to run on the STn nodes, the whole thing works flawlessly;
  • if I turn Syncthing off on all nodes except one NGn and one STn (two machines only), the problem persists;
  • activating a new shared folder between two NGn nodes (which is not meant to be in the regular setup, this is just for troubleshooting) in the same region works ok.

Any thoughts?

Regardless of everything else, this seems extra odd. The code paths to set ownership, permissions and xattrs are literally disabled by turning off those things. I’d try that again.

I don’t specifically know what would be the root cause here. I’d suspect something with the storage, which you don’t specify, but assuming it’s a normal Linux filesystem this should all just work as far as I’m concerned.

Thanks for the edit. I have just redone the config from the ground up on all machines and tried turning “Ignore permissions” on and “Sync Ownership”+“Sync Extended Attributes” off, and this time it seems to stop the whole thing, as the folders stay in sync. Nonetheless, this doesn’t solve the issue in my case, as I absolutely need permissions and ownership sync.

Also, filesystem is xfs on all machines (added to the first post)

Turn those on but keep xattrs off. That’s the most intrusive one, and most prone to oddness.

Just tried. Setting Ignore Permissions to off works as long as I keep ownership send/sync off. As soon as I turn on send or sync ownership the whole thing starts again.

Sorry everybody, the “attached picture” was in fact missing

Please @calmh , can you edit the first message and put this after the first paragraph?

Or @imsodin , of course

I don’t think it’s critical to the first message, even though it adds context. I don’t know what the problem is, sorry. You’ll need to figure out what the changes are that it thinks it needs to sync, by tracing on model, looking at fileinfo with syncthing cli debug file, stuff like that, and try to figure out where it comes from. The fact that it apparently works with a subset of your machines says to me this is not something fundamentally broken in Syncthing but something inherent to your setup, somehow.

Ok, I’ve nailed it. Syncthing compares files not only based on UID and GID values, but also based on usernames and groupnames. Now, UID/GID 1001/1001, who owns files on all systems, happened to be called differently on NGn nodes (username/groupname) and STn nodes (rsync/rsync). After renaming those on the STn nodes, all is working as expected. Hope this helps someone else that may find himself in a similar situation.

1 Like