[SOLVED] Syncthing stuck, GUI seemingly responsive, but nothing is happening

In the first report the goroutines pointed at goleveldb compaction, happening after the actual GCing. If you enable db debug logging you will also get a line starting with “Finished GC, starting compaction”. That would indicate how much time is spent on which part. I wouldn’t expect compaction to be cpu intensive at all.

I have enabled db and now will wait for the next round of GC to see the logs. I have a feeling that I will have to reduce the GC interval anyhow, as the device has just gone through GC once more, and it again took as much as 29 minutes, which indicates that the long processing time is real, making the device essentially pause syncing for ~1h per day or so (considering the default 13h interval).

Here are more logs from two different machines.

  1. Intel Atom x5-Z8350 / 5400 rpm USB 2.0 HDD

    [HTNCZ] 2021/05/27 01:02:43 INFO: Database GC started - many Syncthing operations will be unresponsive until it's finished
    [HTNCZ] 2021/05/27 01:13:30 INFO: Database GC done (discarded/remaining: 1156/23804 blocks, 0/0 versions)
    [HTNCZ] 2021/05/27 14:13:39 INFO: Database GC started - many Syncthing operations will be unresponsive until it's finished
    [HTNCZ] 2021/05/27 14:43:43 INFO: Database GC done (discarded/remaining: 1393/24365 blocks, 0/0 versions)
    

    The second GC took a little less time than it was the case with Atom Z3740 above, although here both the data and the database are located on a slow eternal USB 2.0 hard drive. This device also syncs much more data than the other one, and thus the DB is much larger, but for some reason it still manages to do GC quicker.

  2. Intel Core 2 Duo T7700 / SATA SSD

    [6OTUH] 2021/05/27 07:59:16 INFO: Database GC started - many Syncthing operations will be unresponsive until it's finished
    [6OTUH] 2021/05/27 08:00:29 INFO: Database GC done (discarded/remaining: 3270/22082 blocks, 0/0 versions)
    [6OTUH] 2021/05/27 21:00:39 INFO: Database GC started - many Syncthing operations will be unresponsive until it's finished
    [6OTUH] 2021/05/27 21:03:10 INFO: Database GC done (discarded/remaining: 3287/22118 blocks, 0/0 versions)
    

    This one is strange, because GC took only a few minutes or so, while the CPU here is also old and slow (although still faster than the Atoms, especially regarding the single-core speed).

I think this boils down to I/O much more than CPU.

1 Like

Yeah, considering all the data, this indeed seems to be the case.

  1. Ryzen 4350G + SSD = ~2 minutes
  2. Core 2 Duo T7700 + SSD = ~3 minutes
  3. Atom X5-Z8350 + HDD = 10-30 minutes
  4. Atom Z3740 + eMMC = 30+ minutes

In the case 3), I could potentially move the database to the internal storage, which is very limited in size, but should still be enough just for the DB. It is an SSD with decent speeds. The biggest problem is 4), where the only available storage is the eMMC. The device also has only 2GB of RAM, which makes it difficult/impossible to store the DB on a RAM disk.

Edit: I still haven’t tried setting --debug-db-indirect-gc-interval to some higher value, but this is mainly because of a worry that once GC triggers after such a long time, it will also take as much to finish. Basically, if it takes 30 minutes to do GC right now every 13h, then it is going to take 10 times longer if the value is set to 130h. Does it scale like that or are my worries baseless here?

I wouldn’t expect it to scale. The GC always iterates over the entire DB, and for the compaction you have a lot more deletions than db tables, thus I’d expect a lot of tables to be affected by compaction after a reasonable short amount of time and then only increase very little if at all with time.

There are some ideas being considered to make this part (GC) less painful - there might be improvements here in the future.
Opened an issue to track that: Database GC can block for a long time · Issue #7722 · syncthing/syncthing · GitHub

1 Like

Just for the record, what kind of SSD and the OS is this on?

I have gathered data only from my Windows devices for now, and even in the fastest case, the GC still takes 2-5 minutes. The specific hardware is a SanDisk x110 SATA SSD. I also have two other machines that use the exact same SSDs, and there it also takes at least 2 minutes to do the GC. Everything runs various versions of Windows 10 only though, with the storage formatted to NTFS (encrypted and compressed).

I will have data from Android later on too, but there I don’t sync that much to begin with, so I also don’t expect the GC to take very long regardless.

Linux/debian, ext4 on LVM on Samsung SSD 850 in a thinkpad t480.

Anyway don’t spend too much time on this, a possible solution “has been thought up”. It’ll just take a moment for it to get into reality (or encounter a blocking problem, hopefully not).

1 Like

Yeah, it’s more out of my curiosity than anything else right now. I have followed @calmh’s advice and increased the --debug-db-indirect-gc-interval, although not to 720h, but rather to 168h (aka a week), mainly due to the one device that needs 30+ minutes for the GC.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.