Web GUI stuck/unresponsive after trying to pause a folder

I have had the Web GUI stuck for about 2 minutes today after trying to pause a folder. It came back to life by itself eventually.

You can see the two minute break in the log:

[D4DZU] 2021/03/11 22:20:40 DEBUG: icxvm-vh5ic Snapshot()
[D4DZU] 2021/03/11 22:20:40 DEBUG: icxvm-vh5ic WithHaveSequence(265610)
[D4DZU] 2021/03/11 22:20:40 DEBUG: indexSender@0xc00fba7a00 for icxvm-vh5ic to TLFGVKC at 10.0.0.2:56317-139.162.117.43:22067/relay-server/TLS1.3-TLS_CHACHA20_POLY1305_SHA256: Sending 4 files (<408 bytes)
[D4DZU] 2021/03/11 22:22:15 INFO: Paused folder "xxx" (icxvm-vh5ic) (sendreceive)

As soon as the GUI got stuck, I took a profile, although I’m not sure how useful it will be.

syncthing-cpu-windows-amd64-v1.14.1-dev.4.g1b4c7673-tomasz86-v1.14.0-222014.pprof (13.8 KB)

I have had the same problem today on a different device. It again took ~2 minutes to pause a folder.

syncthing-cpu-windows-386-v1.14.1-dev.4.g1b4c7673-tomasz86-v1.14.0-170428.pprof (33.5 KB)

A CPU profile is not really the right tool here as you’re probably looking for something blocked on I/O or a mutex. Taking a goroutine dump and looking through that for goroutines that have stalled for two minutes, and then figuring out why, will get you closer.

The least destructive way to do that is to run with STPROFILER and then surf to (profiler-listen-address)/debug/pprof/ (IIRC).

1 Like

In the second cpu profile almsot all time is spent in syscalls triggered by notify. In the first one it’s just about 7s/30s. Still might be something related to IO and filesystem watcher.

STPROFILER was not enabled then, so I couldn’t take a proper profile. Now it is, but I haven’t experienced the issue so far :wink:.

I’m ready though, so once this happens again, I should be able to provide more information.

Also, if this turns to actually be I/O related, then there is a possibility that the culprit may be a flaky RAID that I have here (which I’m intending to dismantle soon).

I’m finally back with some more data on this one. I have just had a situation with the GUI getting completely stuck after trying to unpause several folders at once.

This is how it looked like in the browser. All the folders are “Unknown”, and the local state has been zeroed.

I have downloaded a whole “full goroutine stack dump” from http://localhost:9090/debug/pprof, which I’m attaching below.

goroutine.txt (265.6 KB)

I have also tried running go tool pprof http://localhost:9090/debug/goroutine, but the problem is that I only managed to run it a few minutes later after the goroutine downloaded above, and the GUI had already managed to come back to life in between, so I’m not sure how useful the screenshot below is, but I’m including it anyway.

In short, the GUI was stuck like that for about 5-10 minutes, and then it suddenly refreshed and began to work normally again. Does this provide more information on what the culprit may be here?

Not really.

Are there any other logs that could help in this situation?

Without being able to reproduce: No. Maybe there’s filesystem operations, but it’s a very long shot and you can’t run with that debug logging on all the time. I’d also try disable filesystem watching, again a long shot due to the first profiles, and again not really something you want to do in production. Check system resources when it next happens, maybe there’s something there.

Thank you. I will try to check the OS monitoring tools next time this happens. Just a quick question though, is there any performance penalty for running with STPROFILER and <gui debugging="true"> all the time?

debugging has none, STPROFILER needs some resources collecting and writing the info, but I wouldn’t expect any noticable effect on overall performance.

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.