System deadlock when transferring nontrivial amounts of data

Hi everyone,

I tried Syncthing out for the first time today, but hit a hard blocker that I’m having a hard time making sense of: Even a very basic synchronization setup causes a hard freeze of my whole system as soon as new data from that device is synchronized to other devices. Now, I’m well aware that this is the kind of issue you would usually blame OS setup or hardware issues for, but after trying out things for a couple of hours I believe that there still must be something wrong with Synctool.

As for my setup, I have two devices which are supposed to synchronize against each other:

  • Device A: My laptop, which runs Linux (openSUSE)
  • Device B: My Desktop PC, which runs Windows 7

The issue can be summarized like this:

  • Device B is the one that hardlocks when using Synctool, despite running just fine in everyday usage (mostly programming with Visual Studio, and occasional gaming)
  • Most Synctool tasks work as expected: indexing the Sync folder (Synctool just running on DevB) works fine, and so does fetching new data from DevA (Synctool running on both).
  • Synchronizing small files (~100 KiB) from DevB to DevA works fine, too, although that already causes small hickups (~1 second) during which the mouse cursor doesn’t move. Note that the mouse cursor usually only gets frozen when some system resource is under full load (e.g. the CPU or the hard disk). Since 100 KiB is a very small amount of data, this suggests that Synctool somehow generates a huge workload from this.
  • As soon as reasonably large files (~1 MB) are supposed to be transferred from DevB to DevA, the whole system environment of DevB freezes permanently (I’ve waited several minutes to see if it “wakes up” from this state).

I’ve tried several things to see if this issue could be helped, all of these without success:

  • Updating from v0.11.26 to v0.12rc6 on both devices
  • setting the STTRACE env variable to “all”. As far as I could tell this didn’t yield any useful error messages. I’m not sure if the console output manages to “catch up” before the deadlock happens at all, though.
  • setting GOMAXPROCS to 1
  • In config.xml, setting “hashers”, “pullers”, and “copies” to 1 instead of 0.
  • Using a tool called Process hacker, I forced the process CPU and I/O priorities of the syncthing process to the lowest-possible values.
  • Check RAM usage and disk IO when the deadlock happens. RAM usage seems to not be an issue. However, I was able to tell that the deadlock seems to happen instantaneously as soon as the I/O workload of the Syncthing process jumps from few KiB per second to something like 400 KB/s.

Note that I also double-checked the health of my hard disk using CrystalDiskInfo and CrystalDiskMark, the latter which measures the sequential/non-sequential read/write speeds of the hdd (sequential reads: ~100MB/sec, non-sequential reads: ~0.5MB/sec; I don’t remember the values for write operations, but they were of similar orders of magnitude). The CPU/HDD temperature of my system are also fine as far as I can tell.

I’m not sure what else to try, really. I figured that maybe, even despite all the options I forced to “1” (GOMAXPROCS, hashers, pullers, copiers), Syncthing might still be internally using too many threads that simultaneously access the HDD. Does anyone have any suggestions that I could try?

Any help would be appreciated a lot, so thanks in advance :wink:

There’s something wrong with your system… Synchting (not “Synctool” ;)) doesn’t do anything low level that would cause a freeze - it just reads and writes files and send data over the network, much like anything else.

If you got an actual blue screen, that might have had a pointer or two towards and actual culprit. As it it, I’d suggest a memory test to start with. Besides testing the actual memory, it puts some load on the system and does have a tendency to flush out issues with the CPU and motherboard as well.

Hi calmh, thanks a lot for your reply!

Synchting (not “Synctool” )

Amazing how I got this wrong multiple times even.

As it it, I’d suggest a memory test to start with

I’ve tried doing so, and all seems to be fine. I’m still not quite sure how Syncthing could stress my hardware more than e.g. multithreaded compiling of a reasonably big software project, though. I mean, I get that this shouldn’t happen on working hardware, but at the same time something must be special about Syncthing which triggers this behavior so quickly and reproducible.

What’s particularly startling is that synching from DevA to DevB works perfectly fine for any amount of data while the other direction causes the deadlock almost immediately - how do these two directions even differ to begin with? Don’t both directions involve scanning DevB’s directory tree and hash computations anyway?

Synchting doesn’t do anything low level that would cause a freeze - it just reads and writes files and send data over the network, much like anything else.

This is what I was curious about - is there any chance that Syncthing is accidentally (or even intentionally) spawning a large number of threads even when the transfer size is just few MB?

To be clear, I neither suspect that my hardware is broken nor that Syncthing has an actual bug. My (possibly very uneducated) guess is that Syncthing might be a bit overoptimistic with regards to the number of threads it can run simultaneously and ends up starving out all system resources. Are there maybe some more configuration options I could try to see if this the case, or if need be some source code modifications?

Syncthing will spawn one OS thread per GOMAXPROCS, and potentially one per blocking system call (disk access, essentially). You might get a number of those from the database layer, and the number of simultaneous outstanding request from the other side - the “pullers” config setting on that side.

What’s he last thing printed when running with STTRACE=all? Is it consistent?

You might get a number of those from the database layer

Speaking about the log messages relating to that or speaking about the actual source code?

What’s he last thing printed when running with STTRACE=all? Is it consistent?

It does look somewhat consistent. Starting with STTRACE=all synthing --verbose --logflags=18 --logfile=somefile.log, stdout seems to stop at

protocol.go:418: DEBUG: message data:

[0x8c bytes of file data]

events.go:194: DEBUG: poll 1m0s

protocol.go:663: DEBUG: wrote 131088 bytes on the wire

protocol.go:650: DEBUG: write uncompressed message; {0 8 3 false} (len=131080)

protocol.go:663: DEBUG: wrote 131088 bytes on the wire

protocol.go:650: DEBUG: write uncompressed me [cut off?]

protocol.go:650: DEBUG: write uncompressed me [cut off?]

Unfortunately, I can’t copy-paste the full log for obvious reasons. The logfile stops semiconsistently at

[E2J4O] 13:45:25 leveldb_transactions.go:74: DEBUG: insert; folder=“default” device= File{Name:"", Flags:0100666, Modified:1327588948, Version:[{2779784379058205879 3}], Size:158527, Blocks:<…>

The logfile is being filled with an inconsistent amount of NUL bytes following that (usually interrupting the last log message), which is interesting to say the least. It does seem like stdout is “further” along anyway, since I couldn’t find the particular “protocol.go” log message last printed to stdout in the logfile.

(line numbers referring to the 0.12.0-rc6, although do note that I tested 0.11.26 initially, too)

I’ve reread your previous post and missed this remark before:

It turns out that setting the “pullers” value to 1 on my laptop (DevA, the one which doesn’t crash) makes DeviceB work fine during synchronization (still with hickups happening, but no permanent mouse-cursor freezing… EDIT: see below) - so that’s at least a workaround for my issue, making Syncthing (more) usable :slight_smile:

EDIT: well, derp. Turns out after having it running for another while, it does freeze even with this new configuration… It did manage to synchronize a fair amount of data (1 GB) though.

Sounds like sending stuff over the network causes your system to freeze. Can you upload a large file quickly from B to A via FTP or SCP or a file share or something without issues?

Indeed, an FTP transfer like this also causes a similar deadlock.

For reference, I’ve searched around for a solution to this a bit, and it seems that https://www.petri.com/improve_windows_vista_network_performance provides exactly the command needed to resolve the issue for me. Still not entirely sure what’s wrong there and if that’s a proper fix, but at least Syncthing is now working fine for my whole 3 GB folder even with default settings.

So… good enough for me I guess. Thanks again for helping me out there :smile:

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.