Folders stuck trying to sync existing files

By that I meant when you set up a connection between device 1 and 2 in a directory A with subdirectory AB. And then there is a connection between

Device 1 Directory A <-> Device 2 Directory B   
Device 1 Directory AB <-> Device 2 Directory C

then this would be nested for Directory A and Directory AB.

Now I see what you meant. What I have been doing though is rather something like this:

Device 1
   Directory A
      Directory B
      Directory C
Device 2
   Directory A
      Directory B
      Directory C
Device 3
   Directory B
   Directory C

I think this is an issue that’s been popping up for the last two - three months in various forms. I have mentioned it, others too. So far the only cure is to blow away all the index folders and start again, but after a few weeks it comes back.

There’s no obvious reason for the cause, take for example this…

image

It’s physically the same computer just two user profiles that’s created two IDs. This is a send only computer (the image is from the receive only end) and looking at exactly the same data yet there’s a difference. Nothing nested, full access, Windows based installation.

My feeling is something is upsetting the db, maybe under heavy load the data isn’t being written correctly or being missed and it slowly falls out of sync between the nodes, eg, St says the data was written, db says not. This is a view based on when I make changes to the advanced configuration and is St is busy the changes are not reflected in the config.xml file

I think there is no load problem. In my installations I run around 60 folders in which log files with kB and databases with GB and more are moved every day. That works fine. In one case, I transferred a file over internet during several days with connection interruptions because the servers were down for a few hours every day. Something like that also works perfectly.

In my experience, it is the changes that the administrator makes that can throw a steady system out of balance.

A few days ago I reported that I switched on a computer in a place that I do not visit so often, which was switched off for around 3-4 months. I switched on this integrated computer and that was enough to cause an imbalance without changing anything. The synchronized files were accordingly out of date. I have also renamed some files in the meantime. And version jumps have also been added.

However, there may be other circumstances as well. As a rule, I have directories that only exchange my servers or, in addition, the clients also exchange with each other, i.e. Windows and Linux Mint clients. And as a rule, I connect every client with everyone. It has happened that I have forgotten a client in a connection and if I add it, rarely this may also lead to imbalances.

I have now experienced the same issue on two other devices. The same thing with some folders getting stuck occurred there, although in one case, the global state of a new folder that was shared to that device also reported fewer files than it should. Everything did get fixed after -reset-deltas though.

Apart from the nested folders, my other suspicion is that the issue may have something to do with Syncthing upgrades, and as I compile the binaries myself, I also tend to upgrade them quite often (or at least more often than the official releases).

I have now enabled sendFullIndexOnUpgrade on all my devices. I am going to keep an eye to see whether the problem comes back or not. I also want to do more extensive testing, so that I can actually reproduce the issue in a clean environment, but time is limited, so it will probably take a while :sweat:.

Tomasz, seems we both are doing and saying the same things, my reply to you back in October

As I mentioned earlier in the thread, clearing the db in any form will fix it, but it’s not a permanent fix as the error comes back. I’m hoping that we can start testing the badger database to see if that resolves it.

As a non developer I can only make guesses and hope that something clicks with a dev. At the moment my opinion is either data isn’t being written into the database (eg, heavy load), or maybe during the compact stage something lost / corrupted. This is based on a fresh index having lots of small files then some time later they are larger. I wonder if at this point the issue creeps in.

1 Like

Yeah, although that one was supposed to have been fixed in Shutting Syncthing down while pushing files to multiple devices corrupts the database? · Issue #7036 · syncthing/syncthing · GitHub. I have a feeling that these errors are somewhat connected to that, especially since the last two devices had their Syncthing version upgraded in between the synchronisation, and right after that some of their files got stuck in sync.

The badger database is dead, as far as I know :wink: (see lib: Remove USE_BADGER experiment (#7089) · syncthing/syncthing@7892547 · GitHub).

2 Likes

I have just encountered the issue again, right after upgrading everything to v1.13.0.

Fortunately, only 1 device seems to have got stuck this time. Everything had sendFullIndexOnUpgrade enabled, but it did not prevent the problem from happening. I am now quite sure that the problem is related to the upgrade process, as I had no issues whatsoever during the last 3 weeks when everything was running v1.12.1.

Can you check which (if any) of this steps brings the device (closer to) up-to-date. Device A is the device you took the screenshot on, device B is the device that has status syncing. Ensure a bit of time to pass after each step to see if there’s a change.

  1. On A, pause a folder that is shown as out-of-sync.

  2. Same on B.

  3. On A or B, pause the other device and unpause it.

  4. On A, restart syncthing (without any special options).

  5. Same on B.

  6. On A, restart syncthing with -reset-deltas.

  7. Same on B.

1 Like

Hi Simon,

Further to our conversation about Android and SD cards…

Well FWIW, when I gave up on the SD card, and instead synced to the internal storage, my syncs too got stuck in the high 90%s.

I started the OpenBSD-side with -reset-deltas and the devices immediately re-synced to 100%.

The files that were getting stuck were a mix of directories (size 128 bytes), images files something (android music player?) had put in .thumbnails, and the occasional FLAC file. There were no errors displayed on either side, just that the folders were out of sync.

Hrm…

Firstly, just to give more context, this is the affected Device A.

This is the other side, i.e. Device B.

Only 1 folder seemed to be affected by the issue. The folder itself is shared between multiple devices, but only the share between these two devices had the problem.

What is interesting is that the folder state was exactly the same on all devices.

However, the number of the stuck files was larger than the actual folder state.

This is interesting. The steps 1-5 did not help. However, after restarting A with -reset-deltas, the issue seemed to have resolved itself for a moment, but then this came up.

New changes to the folder got stuck again, but this time it was not only A, but also other devices that had these files stuck trying to push them to B. Device B still marked everything as “Up to Date”.

Now, I am not a fan of resetting deltas/database on B, as the hardware is slow, and there are tons of folders, but there was no choice, so I did it, but then everything got quite messy. While the problematic folder shared between B and A seemed to get fixed, B itself got stuck trying to send stuff to other devices with no progress. I am not sure what the problem was exactly about, but I restarted Syncthing on B once again, and now it seems to be pushing indexes to the rest of the cluster slowly. It may take a few hours until everything stabilises.

I will report back again later once I can say for sure what the situation actually looks like.

Edit: The situation seems to have normalised, at least for now. I will write back if the problem re-occurs.

2 Likes

The problem has manifested itself yet again after “upgrading” Syncthing on 1 device. Not really upgrading, as I just switched from x86-32 to x86-64, but the binary has been replaced nevertheless.

It seems that after doing the upgrade, at least one of the other connected devices gets stuck in this state. The files themselves are old and have not changed at all. It is only the state that is broken, as the folders are in fact 100% in sync.

I really need to figure out a way to reproduce this in a clean environment…

1 Like

As you have sendFullIndexOnUpgrade enabled, there’s some chance that you are affected by something fixed in lib/db: Fix and improve removing entries from global (ref #6501) by imsodin · Pull Request #7336 · syncthing/syncthing · GitHub. That’s not certain at all though - no promises but a small sliver of hope :slight_smile:

That would be awesome indeed. I have suspect that there’s races involved, which makes it very hard to reproduce.

Do you mean that using this option could cause the issue? I am asking because I did not have it set before when the problem appeared for the first time. I have actually enabled it thinking that it may prevent this specific behaviour, but obviously it is not really working, so there must be more to it.

I am not saying that it does. It’s really important to stress that I still don’t know what causes these issues. I am not uncertain about a or several possible causes, I don’t know any causes. I just found a bug, that is related to resetting indexes, which happens on upgrade with sendFullIndexOnUpgrade. That makes it possible, that it causes the issue you see, but I don’t know e.g. a sequence of events that would trigger that bug and result in the issue you see. Without a reproducer all I can say is: Lets fix the bug and hope this issue won’t come up again afterwards, then it likely was related - otherwise it wasn’t.

No, I understand that the bug may be unrelated :wink:. I just wanted to confirm and add the information that I had this problem both before and after enabling sendFullIndexOnUpgrade.

One thing that I am thinking about is that however hard I try, I cannot reproduce this in my test configuration. The problem is that the test config uses just 3 devices and 1 folder, while in the real life I get these issues in a network of ~10 devices and tons of folders. I may need to add more folders and more devices to my test config in order to actually be able to get to something meaningful. The connection quality also differs much, as the test config runs on my local computer, but the real devices are located in different countries, use patchy network, etc.

1 Like

I would just like to give a quick update. I have now set up a network of 9 instances of Syncthing, all connected with each other and sharing the same folder. However, I am still unable to reproduce the issue. I have tried upgrading one instance at a time, and then all of them at the same time, but everything has always eventually stabilised with no errors.

I guess that I will have to wait for v1.14.0 and see what happens during the next upgrade in my real network :fearful:.

I am sorry for another bump, but yet one more device/folder has got stuck in the same way. The difference is that this time I did not upgrade/change the Syncthing binary on any of the involved devices. This would mean that the problem may have nothing to do with the upgrade process at all, but rather something else causes it.

One possibly important note in this case is that the device in question is used only sporadically, i.e. usually turned on every few days for just a few minutes to sync the files, and then turned off completely. Also, all the folders are set to “Receive Only”. There are no nested folders or other non-standard configurations involved.

This is how the situation looks in details.

  1. Device A (mentioned above) - all “Up to Date”.

  2. Device B and Device C - both trying to push the same already synced files to Device A.

I have also queried the REST API, and here are the results.

rest-A.txt (1.0 KB) rest-B.txt (1.0 KB) rest-C.txt (1.0 KB)

The actual differences between the three are as follows, in the order of Device A, B, and C.

Is there anything in this information that could help in the way to find the actual culprit leading to this behaviour?

I am new to Syncthing so I have only been running v1.13.x but I wonder if you have encountered the same issue as myself. Please try go into Remove Device on EACH computer, un-share the problem folder and Save. Go back into Remove Device and re-share the problem folder and Save. This fixed the issue 100% for me and I’ve not further problems since with files syncing. I have had to do with for every new device I’ve set up, whether that is a Linux PC, Windows PC or even Android device.

As all three devices show the same output (despite the difference pointed out, which are not relevant for the “syncing algorithm”), this is a case of device A not sending indexes, or B and C consistently “loosing” those indexes. Problem as usual is that the key information is how we ended up in this state, for which there’s few if any pointers after the fact. In any case don’t feel sorry for repeated reports, their definitely valuable, especially with the description of the circumstances. Maybe with time a pattern emerges or some hint triggers an idea that leads to the solution. I assume a delta index reset on device A will get rid of the issue for now.

1 Like