I am experiencing a strange issue and I’m failing to track it down, so I hope I can get some hints here. Please have a look at the attached picture, as it describes my actual infrastructure.
Each of the STn machines is an AlmaLinux 9 box running kernel 5.14 and Syncthing 1.27.8 as root and using QUIC4. All shared folders reside on xfs filesystems.
Each of the NGn machines is just like the above but also running nginx. All of them share the following in send-receive bidirectional mode:
- /usr/share/nginx, which contains the root for the websites;
- /etc/nginx/conf.d, which contains the configuration files for the virtual hosts. (actually, those folders are in a different path for the machines who don’t run nginx, but I guess it makes no difference)
All is well with /etc/nginx/conf.d, which actually contains only 3 text files. But when I cd into the (empty) subfolder for any of the websites I’m trying to serve, i.e. /usr/share/nginx/website1.com, and run a git clone of my project, as soon as the files have been downloaded, the various machines start exchanging data back and forth endlessly. At some point the process seems to finish correctly, as I see every folder on every machine is reported as synced, but in a matter of seconds the data exchange starts over, lasts a few seconds and completes, then again and again.
When comparing any file that gets synced over and over I notice there’s no change in its contents, but atime and ctime will vary, as you can see with the following two stats I run a few seconds away from each other (please also note that ctime keeps increasing, while atime goes back and forth from a fixed 2024-06-10 11:41:51.078680635 to the latest ctime):
[root@NG1 website1.com]# stat path/to/file.css
File: path/to/file.css
Size: 3827 Blocks: 8 IO Block: 4096 regular file
Device: fd00h/64768d Inode: 934036 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 1001/ username) Gid: ( 1001/ groupname)
Access: 2024-06-10 11:41:51.078680635 +0200
Modify: 2024-06-10 11:41:51.078680635 +0200
Change: 2024-06-10 12:46:02.112155733 +0200
Birth: 2024-06-10 11:41:51.078680635 +0200
[root@NG1 website1.com]# stat path/to/file.css
File: path/to/file.css
Size: 3827 Blocks: 8 IO Block: 4096 regular file
Device: fd00h/64768d Inode: 934036 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 1001/ username) Gid: ( 1001/ groupname)
Access: 2024-06-10 12:46:04.481162116 +0200
Modify: 2024-06-10 11:41:51.078680635 +0200
Change: 2024-06-10 12:46:02.112155733 +0200
Birth: 2024-06-10 11:41:51.078680635 +0200
[root@NG1 website1.com]#
While trying to hunt for which process was making changes I ran the following:
[root@NG1 website1.com]# systemctl stop nginx
[root@NG1 website1.com]# auditctl -a exit,always -F path=/etc/nginx/conf.d/website1.com/path/to/file.css
[root@NG1 website1.com]# ausearch -f /etc/nginx/conf.d/website1.com/path/to/file.css
and got a number of references all pointing to syncthing (no other executable mentioned). So I went:
[root@NG1 website1.com]# ps aux | grep syncthing
root 955 1.3 0.3 1251040 24068 ? Ssl 11:32 1:38 /usr/bin/syncthing -no-browser -gui-address=0.0.0.0:8384 -no-restart -logflags=11 -logfile=/var/log/syncthing.log -verbose
root 985 18.9 1.9 1389448 154576 ? SNl 11:32 22:56 /usr/bin/syncthing -no-browser -gui-address=0.0.0.0:8384 -no-restart -logflags=11 -logfile=/var/log/syncthing.log -verbose
root 38323 0.0 0.0 3876 2148 pts/0 S+ 13:34 0:00 grep --color=auto syncthing
[root@NG1 website1.com]#
[root@NG1 website1.com]#
[root@NG1 website1.com]#
[root@NG1 website1.com]# strace -f -p 985 -P /etc/nginx/conf.d/website1.com/path/to/file.css
strace: Process 985 attached with 12 threads
[pid 987] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=985, si_uid=0} ---
[pid 988] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=985, si_uid=0} ---
[...]
[pid 989] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=985, si_uid=0} ---
[pid 990] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=985, si_uid=0} ---
[pid 989] newfstatat(AT_FDCWD, "/etc/nginx/conf.d/website1.com/path/to/file.css", {st_mode=S_IFREG|0644, st_size=3827, ...}, AT_SYMLINK_NOFOLLOW) = 0
[pid 989] fchmodat(AT_FDCWD, "/etc/nginx/conf.d/website1.com/path/to/file.css", 0644) = 0
[pid 989] llistxattr("/etc/nginx/conf.d/website1.com/path/to/file.css", "", 1024) = 0
[pid 989] newfstatat(AT_FDCWD, "/etc/nginx/conf.d/website1.com/path/to/file.css", {st_mode=S_IFREG|0644, st_size=3827, ...}, AT_SYMLINK_NOFOLLOW) = 0
[pid 989] fchownat(AT_FDCWD, "/etc/nginx/conf.d/website1.com/path/to/file.css", 1001, 1001, AT_SYMLINK_NOFOLLOW) = 0
[pid 989] newfstatat(AT_FDCWD, "/etc/nginx/conf.d/website1.com/path/to/file.css", {st_mode=S_IFREG|0644, st_size=3827, ...}, AT_SYMLINK_NOFOLLOW) = 0
[pid 989] utimensat(AT_FDCWD, "/etc/nginx/conf.d/website1.com/path/to/file.css", [{tv_sec=1718012511, tv_nsec=78680635} /* 2024-06-10T11:41:51.078680635+0200 */, {tv_sec=1718012511, tv_nsec=78680635} /* 2024-06-10T11:41:51.078680635+0200 */], 0) = 0
[pid 989] newfstatat(AT_FDCWD, "/etc/nginx/conf.d/website1.com/path/to/file.css", {st_mode=S_IFREG|0644, st_size=3827, ...}, AT_SYMLINK_NOFOLLOW) = 0
[pid 990] newfstatat(AT_FDCWD, "/etc/nginx/conf.d/website1.com/path/to/file.css", {st_mode=S_IFREG|0644, st_size=3827, ...}, AT_SYMLINK_NOFOLLOW) = 0
[pid 989] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=985, si_uid=0} ---
[pid 988] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=985, si_uid=0} ---
[...]
[pid 989] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=985, si_uid=0} ---
[pid 990] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=985, si_uid=0} ---
[root@NG1 website1.com]#
So it seems that no other process is accessing that file, and Syncthing is endlessly mangling permissions, xattrs and timestamps, for no apparent reason. The same happens if I turn on “Ignore permissions” and disable ownership and xattrs synchronisation. Also tried to switch to different paths, just in case, to no avail. There are a few more strange thing to keep in mind:
- if I stop the Syncthing process on the NGn machines, only leaving it to run on the STn nodes, the whole thing works flawlessly;
- if I turn Syncthing off on all nodes except one NGn and one STn (two machines only), the problem persists;
- activating a new shared folder between two NGn nodes (which is not meant to be in the regular setup, this is just for troubleshooting) in the same region works ok.
Any thoughts?