Syncthing up-to-date, logs changes, but not replicating changes

csbv · July 26, 2021, 7:48pm

Setup: Centos 7 (2 on prem machines), amazon-linux-2 ami (cloud instance - basically centos 8 but amazon variant I believe), Firewalld rules in place for 22000 tcp / upd, and 8384 tcp. I also have 21027 tcp added… though I’m pretty sure it’s not required for this type of setup.

I was able to successfully get a standard syncthing installation running after an Ansible automated install. It worked pretty well. Now, I’m trying to harden the setup.

I removed the global and local discovery servers. I left relay enabled for now… but am looking to remove it.

I worked with network services to get the connections working between on prem (2 servers) and the cloud (1 server). the 2 on prem servers do not replicate between each other… however they both replicate to/with cloud server (OR SHOULD REPLICATE, we will get to that shortly). This setup was 100% tested and worked great before I started hardening a bit more and keeping traffic local.

I ended up setting advanced → devices → localdevice as tcp://devicednsname:22000. Most everything else I left as dynamic.

What’s now happening is, if we add a file to the cloud server, the logs on all 3 servers show the action. However, the file does not make it to the 2 on-prem server. We added a folder and supporting recursive files to 1 of the on-prem servers… it’s not making it to the cloud server and other on-prem server. AND unlike the cloud server, the logs are not making it to on prem server (where the action occurred), the other on-prem server, OR the cloud server.

I have no errors. All folders show up-to-date. All devices show up-to-date across all servers (though we know it’s not by looking at the file system level). All devices in the gui show the ipdress:port of the respective servers setup. journalctl -xe | tail -n 100 shows no errors.

The last piece today I was able to complete (After network services changed some settings on their side), was adding the folders from cloud instance to the last on-prem instance that needed the folders. The files / folders had transmitted at that time - 100% synchronized. the only different now is that I removed local discovery. global discovery was previously removed.

There are a lot of pieces in play. I’ll try to unconfuse any of the above that is confusing. Thank you in advance for your help.

AudriusButkevicius · July 26, 2021, 8:07pm

The decision whether files are synced or not is not related to discovery.

I’d verify that the devices are actually connected to each other and that the folders shares have matching ids, etc.

csbv · July 26, 2021, 8:22pm

according to the established connections in the logs, it appears they are connected. I tell you what… I’ll rip the 2 on-prem systems out…and re-add them to the cloud instance.

Also, confirmed… folder id’s do match for all the folders across all 3 instances.

The 1st file change I mentioned that appears in the logs, the folder on all systems show: |Latest Change | Deleted testfile3|

however, the other change I mentioned where the folders / files were added still do not show up in logs NOR does it look like it even recognized them.

Systemctl status syncthing@user.service is running well on all 3 systems. no error.s

Let me remove the systems and re-add them to see if something might have barfed somewhere today.

csbv · July 26, 2021, 9:16pm

ok… removed all shared folders and devices. Added the 3 folders back to the cloud server with nothing shared yet. I then added the 2 on-prem devices… used tcp://devicednsname:22000. Both devices picked up the proper tcp://ipaddress:22000 under the device descriptions

I then shared the 3 folders to the 2 on-prem systems. All folders and devices show up-to-date. There still is 1 folder (and recursive files) difference on 1 of the on prem servers. These changes were not replicated. I have no errors

Folder ID’s match. device IDs match. Devices were all recognized. Still no errors in logs, no exclamation points, no errors in GUI… or warnings.

I created 1 new file each in 2 different folders. In 1 folder (let’s call it ell), I created a file on the cloud instance. This file did not make it to the other instances, but the log did… AND the folder last change showed on both on prem instances even though the file didn’t make it: Latest Change Updated testfile-ell-virg

I then deleted that file from the cloud instance and noticed across the 2 on-prem instances (Where the file never existed): |Latest Change | Deleted testfile-ell-virg|

I happened to notice that I did not receive any errors… but the .stfolder was not created on the 2 on prem servers. In previous installations, I did receive errors that they could not be created. Not this installation, but they didn’t create. These folders are now created… so they are in all 3 folders across all systems I’m trying to sync.

Restarted everything. resync’ed, waited.

I created a counter-part file in the ell folder on one of the on-prem servers. This log NOR file shows up on the local, on-prem or cloud instance.

I then added another file to the cloud instance… same thing… logs made it to the other system, looks like it was replicated via Gui. File system on 2 on prem instances do not see the file.

The folder (and recursive files I mentioned before) are sitting on 1 of the on prem instances. That did not replicate either.

Everything shows up-to-date and connected still. Network Services DOES have things open for both sides (at 1st, we had 8384 tcp and 22000tcp/udp added). They expanded it to ensure no other blocks were happening. both on prem servers cannot ping each other, but they are NOT setup with each other. Both are setup to the cloud instance that can ping each other both directions.

any ideas?

Nummer378 · July 27, 2021, 1:05am

Have you double checked that you’re looking at the correct folder paths, e.g does the folder path as shown in syncthing match the path you’re expecting the files at?

AudriusButkevicius · July 27, 2021, 4:11am

As a next step I suggest you post screenshots to explain the issue.

csbv · July 27, 2021, 2:00pm

Nummer… you were spot on. I triple checked the folders/paths… but missed an incorrect character in there in all 3 folders. I thought I selected the correct path entering via GUI. I made an error.

Now my “issue” seems to be the 2 on prem nodes are stuck at 67% syncing since last night. The cloud instance shows all up-to-date. All folders on all 3 instances show up-to-date, no errors on any instances… I’m not exactly sure why the sync is stuck.

Out of Sync Items [1,595 items, ~27.4 MiB] Local state of on prem instances:

Cloud instance local state:

The 27.4MiB that is shown OOS is the exact difference of the storage size between cloud and 2 on prem instances.

When I click to see OOS items, the box shows:

New changes are replicating though. So far, it appears to be all changes are syncing.

AudriusButkevicius · July 27, 2021, 3:06pm

I suggest you post full screenshots, as people usually not capture the parts we are interested in.

csbv · July 27, 2021, 3:37pm

on prem:

Remote device is the cloud instance.

Logs - finally have a couple messages in regard to syncing:

Both on prem instances have a couple of these type of messages. The other instance time frame was 930a and there were no more messages after that. So maybe they resolved?

Cloud:

remote devices are the 2 on prem instances.

What other screenshots would you like?

I will be removing the relay shortly unless there’s a reason not to… so it should only be ipv4 and ipv6 (or 2/2 listeners)

AudriusButkevicius · July 27, 2021, 5:22pm

You might need to shutdown all syncthing instances and restart them with --reset-deltas command line. See if that fixes it.

csbv · July 27, 2021, 6:24pm

Forgive the ignorance on this one… I shutdown all the services on all 3 servers, and ran the following:

syncthing --reset-deltas

This restarts the program “interactively” on my sessions because I’m running the program. Is that what I need to do and then wait? I did that and so far, syncs are stuck at the same point. The interactive sessions are still running right now, but still no change after 5 min

csbv · July 27, 2021, 6:34pm

I tried a --reset-database… that seemed to work so far. I’ll keep you posted on our tests to ensure it is all working properly.

Thank you for the ideas

csbv · July 28, 2021, 12:21pm

Confirmed… still working. I’m getting ready to deploy to 2 additional production servers now.

thank you for all of your help

system · August 27, 2021, 12:22pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.