Unresponsive Web GUI - pause and resume is not registered

I have experienced a strange problem recently. It started only a few weeks ago, as before that everything had been working properly.

The issue is that after some time, when I try to “Pause” or “Resume” a folder, the command seems not to be registered by Syncthing. What I mean is that, when I press the “Pause” button, the folder state changes to “Paused”, however:

  1. The remote devices, with whom the folder is shared, do not disconnect.

  2. After refreshing the Web GUI in the browser, the folder state comes back to the previous one, e.g. “Syncing”.

In order to recover from such a state, a complete restart of Syncthing is required. Refreshing the browser tab or reopening the page does not help. Also, this only seems to be the case after the Web GUI has been left open in the browser for a while, i.e. right after (re)starting Syncthing everything works fine.

This happens in Chromium, and I have not tested any other Web browsers yet, but I will try to check whether the problem persists in Internet Explorer and possibly other Web browsers too. Right now, this is going on under Syncthing 1.4.2 and Chromium 81 but it seems to have began sometime around Syncthing 1.4.0.

Has anyone else experienced something similar, or do you have any idea what can be causing this issue?

My experience with such things is that syncthing sometimes needs time to process commands. If this is pause or disconnect a peer, the described effect maybe can be observed under various functions. Especially when many peers are connected or when the peers are large, because the database may have to emulate and save the triggered command.

Therefore, triggering a command and reloading the content shortly afterwards leads to such effects. Maybe test the same thing by giving Syncthing more time.

Unfortunately, this does not seem to be the case. No matter how much time I wait, once the GUI gets stuck, the only way to make it work again is to restart Syncthing. I am just trying to pause a folder, which is being synced between two PCs, so there are only two devices involved.

I have now checked in two different browsers, but there is no difference. I have also observed that the config.xml file is not being updated too. The folder is always set to <paused>false</paused>, and does not change regardless of how much I press the button in the GUI or how long I wait.

You can send a SIGQUIT when it gets stuck to get the stacktraces for further investigation.

Well, this is under Windows, and as far as I understand, SIGQUIT is Unix/Linux only.

For now, I have reset the whole database just to see whether the problem persists. Also, it only seems to be happening with one huge folder located on an external HDD. I will do more testing once the whole thing finishes scanning and stabilises.

Perhaps the external hdd just locks up, locking up syncthing.

The HDD itself is responsive though. I can view the contents and perform normal file and folder operations with no problems. It is just the Syncthing’s Web GUI that gets locked. Once locked, neither “Pause” nor “Resume” buttons work, i.e. the config.xml file remains intact. “Restart” does work, but “Shutdown” either does nothing or seems to lead to crashes.

Just a moment ago, I tried to “Shutdown” and received a message about shutting down successfully, but what really happened was a crash with this message on top of the GUI:

and a panic log:

Panic at 2020-04-20T03:31:38+09:00
panic: database is closed

goroutine 129 [running]:
github.com/syncthing/syncthing/lib/db.(*FileSet).Snapshot(0xc000dca6c0, 0xc01b898260)
	C:/Syncthing/lib/db/set.go:163 +0x164
github.com/syncthing/syncthing/lib/model.(*folder).scanSubdirs(0xc000029b00, 0xc01f1bfe60, 0x1, 0x1, 0x0, 0x0)
	C:/Syncthing/lib/model/folder.go:363 +0x34b
github.com/syncthing/syncthing/lib/model.(*folder).serve(0xc000029b00, 0x10f8a20, 0xc000e4d600)
	C:/Syncthing/lib/model/folder.go:185 +0x734
github.com/syncthing/syncthing/lib/util.AsService.func1(0x10f8a20, 0xc000e4d600, 0xc00115a180, 0x0)
	C:/Syncthing/lib/util/utils.go:183 +0x40
	C:/Syncthing/lib/util/utils.go:247 +0x149
github.com/thejerf/suture.(*Supervisor).runService.func1(0xc0001a8f00, 0xc000000008, 0x1b38638, 0xc000029b00)
	C:/go/pkg/mod/github.com/thejerf/suture@v3.0.2+incompatible/supervisor.go:600 +0x57
created by github.com/thejerf/suture.(*Supervisor).runService
	C:/go/pkg/mod/github.com/thejerf/suture@v3.0.2+incompatible/supervisor.go:588 +0x62

panic-20200420-033138.log (115.0 KB)

Is this of any use?


There are some errors in the Chromium console too:

:8384/rest/system/upgrade:1 Failed to load resource: the server responded with a status of 500 (Internal Server Error)
logbar.js:11 TypeError: Cannot read property 'user' of undefined
    at refreshNoAuthWarning (syncthingController.js:449)
    at syncthingController.js:429
    at angular.js:9452
    at processQueue (angular.js:13337)
    at angular.js:13353
    at Scope.$eval (angular.js:14589)
    at Scope.$digest (angular.js:14405)
    at Scope.$apply (angular.js:14694)
    at angular.js:14984
    at completeOutstandingRequest (angular.js:4959)
console.<computed> @ logbar.js:11
:8384/rest/events?since=938:1 Failed to load resource: net::ERR_INCOMPLETE_CHUNKED_ENCODING
completion:1 Failed to load resource: net::ERR_CONNECTION_REFUSED
completion:1 Failed to load resource: net::ERR_CONNECTION_REFUSED
3:8384/rest/events?limit=1:1 Failed to load resource: net::ERR_CONNECTION_REFUSED
:8384/rest/system/status:1 Failed to load resource: net::ERR_CONNECTION_REFUSED
:8384/rest/system/discovery:1 Failed to load resource: net::ERR_CONNECTION_REFUSED
:8384/rest/system/connections:1 Failed to load resource: net::ERR_CONNECTION_REFUSED
:8384/rest/system/error:1 Failed to load resource: net::ERR_CONNECTION_REFUSED
5:8384/rest/events?limit=1:1 Failed to load resource: net::ERR_CONNECTION_REFUSED
:8384/rest/system/status:1 Failed to load resource: net::ERR_CONNECTION_REFUSED
:8384/rest/system/discovery:1 Failed to load resource: net::ERR_CONNECTION_REFUSED
:8384/rest/system/connections:1 Failed to load resource: net::ERR_CONNECTION_REFUSED
:8384/rest/system/error:1 Failed to load resource: net::ERR_CONNECTION_REFUSED
:8384/rest/events?limit=1:1 Failed to load resource: net::ERR_CONNECTION_REFUSED
logbar.js:11 TypeError: Cannot set property 'errors' of undefined
    at syncthingController.js:333
    at Scope.$broadcast (angular.js:14908)
    at eventService.js:27
    at Array.forEach (<anonymous>)
    at successFn (eventService.js:23)
    at angular.js:9452
    at processQueue (angular.js:13337)
    at angular.js:13353
    at Scope.$eval (angular.js:14589)
    at Scope.$digest (angular.js:14405)
console.<computed> @ logbar.js:11
:8384/rest/system/upgrade:1 Failed to load resource: the server responded with a status of 500 (Internal Server Error)

Thanks, the panic log looks helpful and is pointing at a deadlock of some sorts.

I don’t know where but just recently I saw the same trace elements regarding long-lasting compaction errors (among other db routines):

goroutine 10 [select, 221 minutes]:
	C:/go/pkg/mod/github.com/syndtr/goleveldb@v1.0.1-0.20190923125748-758128399b1d/leveldb/db_compaction.go:91 +0xd4
created by github.com/syndtr/goleveldb/leveldb.openDB
	C:/go/pkg/mod/github.com/syndtr/goleveldb@v1.0.1-0.20190923125748-758128399b1d/leveldb/db.go:148 +0x413

I first thought it wasn’t a cause, just the effect of a slow compaction process that is interrupted by shutdown/db closure, but maybe it is actually a problem.

Isn’t that just an always running goroutine that does nothing but wait for compaction errors, or something like that? (I’d Use The Source but I’m heading to bed…)

So these are the ones that caught my eye:

goroutine 58705 [select, 37 minutes]:
github.com/thejerf/suture.(*Supervisor).RemoveAndWait(0xc0001a8f00, 0x200000033, 0x0, 0x0, 0x0)
	C:/go/pkg/mod/github.com/thejerf/suture@v3.0.2+incompatible/supervisor.go:739 +0x1d9
github.com/syncthing/syncthing/lib/model.(*model).stopFolder(0xc000182780, 0xc01be169c0, 0xb, 0xc0115264c0, 0x14, 0x0, 0xc0115264e0, 0x1d, 0x2, 0xc00eb07b00, ...)
	C:/Syncthing/lib/model/model.go:469 +0xde
github.com/syncthing/syncthing/lib/model.(*model).restartFolder(0xc000182780, 0xc01be169c0, 0xb, 0xc0115264c0, 0x14, 0x0, 0xc0115264e0, 0x1d, 0x2, 0xc00eb07b00, ...)
	C:/Syncthing/lib/model/model.go:528 +0x2fa
github.com/syncthing/syncthing/lib/model.(*model).CommitConfiguration(0xc000182780, 0x1e, 0xc009a08a00, 0xf, 0xf, 0xc0099b3800, 0x7, 0x7, 0x1, 0xc01be16d20, ...)
	C:/Syncthing/lib/model/model.go:2477 +0xaf8
github.com/syncthing/syncthing/lib/config.(*wrapper).notifyListener(0xc000416000, 0x10f5860, 0xc000182780, 0x1e, 0xc009a08a00, 0xf, 0xf, 0xc0099b3800, 0x7, 0x7, ...)
	C:/Syncthing/lib/config/wrapper.go:220 +0x131
github.com/syncthing/syncthing/lib/config.(*wrapper).notifyListeners.func1(0xc000416000, 0xc0097e2c00, 0xc0097e3000, 0x10f66a0, 0xc012231a60, 0x10f5860, 0xc000182780)
	C:/Syncthing/lib/config/wrapper.go:211 +0xed
created by github.com/syncthing/syncthing/lib/config.(*wrapper).notifyListeners
	C:/Syncthing/lib/config/wrapper.go:210 +0x1a7
goroutine 42677 [chan send, 39 minutes]:
github.com/syncthing/syncthing/lib/model.(*sendReceiveFolder).copierRoutine(0xc000028f00, 0xc00bd8dbc0, 0xc00bd8db60, 0xc00bd8dc80)
	C:/Syncthing/lib/model/folder_sendrecv.go:1343 +0xc10
github.com/syncthing/syncthing/lib/model.(*sendReceiveFolder).pullerIteration.func2(0xc000028f00, 0xc00bd8dbc0, 0xc00bd8db60, 0xc00bd8dc80, 0x10f66a0, 0xc01b504f20)
	C:/Syncthing/lib/model/folder_sendrecv.go:246 +0x50
created by github.com/syncthing/syncthing/lib/model.(*sendReceiveFolder).pullerIteration
	C:/Syncthing/lib/model/folder_sendrecv.go:244 +0x4bc
goroutine 57546 [select, 44 minutes]:
github.com/syncthing/syncthing/lib/protocol.(*rawConnection).Request(0xc003f860d0, 0x10f8a20, 0xc0010368c0, 0xc0003d1f80, 0xb, 0xc0132c5f80, 0x66, 0x3c0000, 0x20000, 0xc01472eae0, ...)
	C:/Syncthing/lib/protocol/protocol.go:312 +0x31e
github.com/syncthing/syncthing/lib/protocol.wireFormatConnection.Request(0x11046a0, 0xc003f860d0, 0x10f8a20, 0xc0010368c0, 0xc0003d1f80, 0xb, 0xc00a281ce0, 0x66, 0x3c0000, 0x20000, ...)
	C:/Syncthing/lib/protocol/wireformat.go:40 +0x163
github.com/syncthing/syncthing/lib/model.(*model).requestGlobal(0xc000182780, 0x10f8a20, 0xc0010368c0, 0x35479b5ca86aca9a, 0x4fcb3ab03b6fdd2, 0xfb26b2562e7b33b3, 0xe5860f47d562056b, 0xc0003d1f80, 0xb, 0xc00a281ce0, ...)
	C:/Syncthing/lib/model/model.go:2054 +0x499
github.com/syncthing/syncthing/lib/model.(*sendReceiveFolder).pullBlock(0xc000bdec00, 0xc013527400, 0xc01472eae0, 0x20, 0x20, 0x3c0000, 0x5036989b00020000, 0xc00aa22900)
	C:/Syncthing/lib/model/folder_sendrecv.go:1458 +0x3af
github.com/syncthing/syncthing/lib/model.(*sendReceiveFolder).pullerRoutine.func1(0x10f66a0, 0xc016708520, 0xc00345d880, 0x20000, 0xc000bdec00, 0xc013527400, 0xc01472eae0, 0x20, 0x20, 0x3c0000, ...)
	C:/Syncthing/lib/model/folder_sendrecv.go:1405 +0x102
created by github.com/syncthing/syncthing/lib/model.(*sendReceiveFolder).pullerRoutine
	C:/Syncthing/lib/model/folder_sendrecv.go:1401 +0x334
goroutine 32332 [select]:
	C:/Syncthing/lib/protocol/protocol.go:685 +0x2cb
created by github.com/syncthing/syncthing/lib/protocol.(*rawConnection).Start
	C:/Syncthing/lib/protocol/protocol.go:238 +0x8e
goroutine 32282 [semacquire, 37 minutes]:
	c:/go/src/runtime/sema.go:56 +0x49
	c:/go/src/sync/waitgroup.go:130 +0x6b
github.com/syncthing/syncthing/lib/config.(*wrapper).Unsubscribe(0xc000416000, 0x10f5f20, 0xc0014c2380)
	C:/Syncthing/lib/config/wrapper.go:165 +0x229
github.com/syncthing/syncthing/lib/watchaggregator.(*aggregator).mainLoop(0xc0014c2380, 0xc00b466c60, 0xc01503c9c0, 0x1109360, 0xc000416000, 0x10f8ea0, 0xc000198e70)
	C:/Syncthing/lib/watchaggregator/aggregator.go:161 +0x7b5
created by github.com/syncthing/syncthing/lib/watchaggregator.Aggregate
	C:/Syncthing/lib/watchaggregator/aggregator.go:132 +0xf7

I am not sure why unsubscribe even waits for a waiter?

Also, seems the writer loop for that connection is going yet there is a number of requests hanging.

Wonder if there is a race between connection being closed and a request being fired. Not sure how this is all related to config unsubscribes though.

While I do not understand all the technicalities, what I find strange is the fact that this only happens with this one specific folder. If I try to “Pause” any other folders, some of whom are also shared with the exact same devices, the command is registered and the config.xml file is updated accordingly.

I have just tested this with a few other folders to make sure. I can “Pause” and “Resume” them with no issues. However, as soon as I try to do the same with this specific folder, the whole GUI gets stuck, and no new commands seem to be registered any more, no matter whether they are about this folder, or the other folders.

The only difference, which I can think about now, at least on this machine, is that the folder in question is located on the external HDD, while the other folders are located on an internal SSD.

Is the other side of the folder in some funny storage? It seems we ask for data and not getting any responses for ages.

Do you mean the remote device, with whom the folder is being shared?

The folder there is also located on an external HDD connected through USB. The configuration is actually similar on both sides, i.e. the problematic folder on external HDDs, and the other folders on internal SSDs. I can pause the folders located on the SSDs, which are also shared with the same remote device, with no problems.

Yes, we are not getting responses from the remote device, we ask for data and it just seems to hang.

some USB enclosures frequently spin down the disk to save power, maybe this is the reason why response to data request takes longer as the disk needs to spin up

We’re talking 44mins here.

I have checked the situation on the remote device today, but everything seems to be working correctly. The external HDD itself is connected properly, and there are no problems when browsing the contents locally.

Both computers run Windows 10, and turning off discs is disabled in the Power Options. No 3rd party USB enclosures are involved, as one is an old Samsung external HDD, while the other is a new WD external HDD. There are no SMART errors reported in both cases, and no errors in the Event Viewer in Windows.

I can pause the folder properly on the remote device. The connection between the devices is disconnected, and the configuration file is updated properly. Also, the folder is set to “Send & Receive” on the remote device, and to “Receive Only” on my device. Does this matter?

By the way, I can reproduce the panic by trying to shut down or restart Syncthing while in the locked state. This is a log from a moment ago.

Panic at 2020-04-21T07:10:46+09:00
panic: database is closed

goroutine 174 [running]:
github.com/syncthing/syncthing/lib/db.(*FileSet).Snapshot(0xc0003dd080, 0xc0159dd7c0)
	C:/BuildAgent/work/174e136266f8a219/lib/db/set.go:163 +0x164
github.com/syncthing/syncthing/lib/model.(*folder).scanSubdirs(0xc0009b0000, 0xc009733840, 0x2, 0x2, 0x0, 0x0)
	C:/BuildAgent/work/174e136266f8a219/lib/model/folder.go:363 +0x34b
github.com/syncthing/syncthing/lib/model.(*folder).serve(0xc0009b0000, 0x110ea00, 0xc000997180)
	C:/BuildAgent/work/174e136266f8a219/lib/model/folder.go:185 +0x734
github.com/syncthing/syncthing/lib/util.AsService.func1(0x110ea00, 0xc000997180, 0xc00144b1a0, 0xc0000d6d80)
	C:/BuildAgent/work/174e136266f8a219/lib/util/utils.go:183 +0x40
	C:/BuildAgent/work/174e136266f8a219/lib/util/utils.go:247 +0x149
github.com/thejerf/suture.(*Supervisor).runService.func1(0xc0003e1b30, 0xc000000002, 0x1e79530, 0xc0009b0000)
	C:/BuildAgent/work/pkg/mod/github.com/thejerf/suture@v3.0.2+incompatible/supervisor.go:600 +0x57
created by github.com/thejerf/suture.(*Supervisor).runService
	C:/BuildAgent/work/pkg/mod/github.com/thejerf/suture@v3.0.2+incompatible/supervisor.go:588 +0x62

panic-20200421-071046.log (94.2 KB)

Please raise an issue on Github, and attach the latest panic log. There is definately a deadlock here, atleast for the shutdown part.