Issue with re-attempt delay

Edmon · April 11, 2022, 4:04pm

Hi everyone :).

I am using Syncthing to move files from a remote computer that uses a weak radio bridge. For the most part, it works fine. However, due to the intermittancy of the link, the connection is totally lost at times and sometimes for as long as 10 or so minutes.

Most of the time though, the link goes down only for about 20-40 seconds while the link re-establishes.

My issue with sync thing is that, after a few failures, the re-attempt timeout escalates to insane levels. I.E. You’ll see stuff like this: [KEGWC] 17:30:15 INFO: Folder “Remote Transfer” (cxvjv-uokny) isn’t making sync progress - retrying in 30m4s.

This retry timeout can eventually escalate to many hours, half-days and beyond. It does not even take that long for this is occur, it can make the jump from 30min or an hour to half day in a single bound.

I have tried adjusting the Puller Pause (seconds) configuration but it doesn’t help, the time between attempts still escalates until effectively, Syncthing is defacto disabled by massive retry waits.

Is there any way to fix this issue?

imsodin · April 11, 2022, 4:46pm

Can you show logs of that? The pause is capped at 60x the configurable puller pause, so 1h by default. And it’s only increased/doubled when the timer fires.

Edmon · April 11, 2022, 7:06pm

It appears that the logs erase themselves when the program is closed, so I don’t have any logs for you at the moment. But the delay is definitely doing more than doubling and definitely exceeding 1 hour.

An an ideal world, I’d like to just set the program to try every minute if the connection is lost, with no escalating delay.

imsodin · April 11, 2022, 7:35pm

It’s always immediately retrying a sync, whenever a device that has a folder connects or a device sends new metadata. It’s just automatic retry if nothing else changes that kicks in after this timer.

Basically my point is that to the best of my knowledge nothing of what you describe should be happening, so I need some logs to see what the exact sequence of events is to figure out what is happening.

Edmon · April 11, 2022, 7:44pm

[KEGWC] 21:29:56 INFO: Connection to 6BTQFTQ-7R7WVSO-VKBOGDA-4UA4Q5H-CSKGLGC-3QP4OLC-QWC2RA4-N72ISAE at 192.168.0.159:52200-192.168.0.253:22000/tcp-client/TLS1.3-TLS_AES_128_GCM_SHA256 closed: reading message: read tcp 192.168.0.159:52200->192.168.0.253:22000: wsarecv: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.

[KEGWC] 21:29:56 INFO: Folder “Remote Folder” (cxvjv-uokny) isn’t making sync progress - retrying in 3m35s.

This is all there is, plus a list of files that failed.

You will see different times:

[KEGWC] 21:29:56 INFO: Folder “Remote Folder” (cxvjv-uokny) isn’t making sync progress - retrying in 3m35s.

[KEGWC] 21:29:56 INFO: Folder “Remote Folder” (cxvjv-uokny) isn’t making sync progress - retrying in 10.01s.

[KEGWC] 21:30:06 INFO: Folder “Remote Folder” (cxvjv-uokny) isn’t making sync progress - retrying in 10.012s.

[KEGWC] 21:30:16 INFO: Folder “Remote Folder” (cxvjv-uokny) isn’t making sync progress - retrying in 20.013s.

[KEGWC] 21:30:36 INFO: Folder “Remote Folder” (cxvjv-uokny) isn’t making sync progress - retrying in 40.012s.

[KEGWC] 21:43:10 INFO: “Remote Folder” (cxvjv-uokny): Failed to sync 6 items

[KEGWC] 21:43:10 INFO: Folder “Remote Folder” (cxvjv-uokny) isn’t making sync progress - retrying in 13m27s.

As you can see, sometimes it starts at a random delay, sometimes it doubles, sometimes it doesn’t.

This is just after me running the program since my last message. Eventually the numbers will climb until many hours or even half days of delay are added.

imsodin · April 11, 2022, 9:51pm

Right, I forgot one aspect: The pause is always that base pause that doubles plus the time the failed pull operation took. So it’s longer and not the same all the time. Anyway, those lines look good (expected timer intervals, timer fire after about those intervals, no indication it’s missing retries). The interesting part would be why it fails and remedying that. I still don’t see any evidence of the retry logic being problematic.

Edmon · April 12, 2022, 1:07am

I don’t understand why this is good logic. Why would you want to wait vast and escalating quantities of time before trying again? Based on what you said (that failed state time is added), I am guessing that the “failed state time” builds over time or is doubled in some way, which would explain why eventually you see stuff like 16h23m wait times (eventually, this was the biggest one I’ve seen).

It all seems very random and arbitary.

Surely just a set, fixed, retry time makes more sense?

Anyway, if these massive, incomprehensable waits are intended behaviour, then that unfortunately makes this otherwise awesome tool kind of useless if you have an intermittant connection of any kind.

imsodin · April 12, 2022, 7:35am

There’s a misunderstanding here: I am not saying it’s good that it waits that long and doesn’t sync when being connected again on an intermittant connection. I am saying that shouldn’t happen, no matter what the interval is. Because a sync should happen instantly when it reconnects. Even if there’s a long pause at that moment, because that not a pause for all syncing, it’s a reminder to sync again even if nothing happens (no reconnect, no file changes, no remote device changes, …).

So please do give me some information about what’s happening, the errors, connections and timings. Maybe there’s a problem with the reconnect detection. However without having any information/seeing a pattern, I can’t investigate what that problem might be.

As for why we don’t have a short, fixed, retry interval: Syncing isn’t free, it uses resources. In the absence of any other change, there’s not much chance a retry will succeed - most likely it just encounters the same error again. Thus we increase the times between retries, but also retry immediately when there’s a reason to do so.

Edmon · April 12, 2022, 8:08am

I mean from what I can see, the software detects that the link has been restored just fine. It just sits at 0/b/sec transfer speed until the “wait” timer has expired. But it does show that the other PC is connected.

When the connection fails, it will mark it as failed due to I/O failure, what makes sense, since the link physically loses connection from time to time.

So the connection would succeed if it tried, but the wait to retry timer is actually stopping any attempt to continue transferring information.

imsodin · April 12, 2022, 8:12am

Could you please share logs of that, with “that” being “detects that the link has been restored” and “sits at 0/b/sec”.

What you describe is not how it is (supposed) to work. Everything about the connection establishment is independent from that wait timer. You seem to be entirely fixated on it, but for all intents and purposes you can ignore it. It’s just a secondary mechanism to kick off sync operations, the primary ones should get you syncing again before that. And if that doesn’t work, I want to fix that. And for that I need your logs please

Edmon · April 12, 2022, 12:12pm

I will see what I can get. When I say the link has been detected as restored, I mean that the GUI is showing the other PC as being connected it’s little status box.

But that is showing zero transfer and the logs are showing a wait.

system · May 12, 2022, 12:13pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.