can I change log file size and rotation policy in runtime? with full debug I can have like all 3 * 10 MB logs rotate in a few seconds…
You can tweak those with command line switches (see https://docs.syncthing.net/users/syncthing#cmdoption-log-level and below).
If the DB is growing, there must be activity, right?
I checked the profiles of the support bundles you sent:
- support-bundle-EBBXITJ-2025-09-05T161759:
Scanning, index sending and receiving. Lots of time spent on reading files, even more than inserting - resp inserting also spent most of it’s time reading during to DB, andrecalcGlobalForFile
. - support-bundle-EBBXITJ-2025-09-05T222902: Little cpu activity, just a bit of network stuff and some incoming block requests.
- support-bundle-NOEMNLL-2025-09-05T151803: Most time spent reading from FS for scanning.
There’s nothing conclusive there as far as I see, resp. I don’t relaly have good picture of what’s happening here.
Looking at recalcGlobalForFile
might be worthwhile resp. getting cpu profiles while running benchmarks (on a slow enough FS) to get more/repeatable info on where the bottleneck is.
NOEMNLL is 1 at hour rescans, so yeah, it scans. Its OK.
EBBXITJ was feeding a new node which was long absent; but all these stats are for period before grow started, they are all there at the first bundle, as it seems to me. But, the most of the excess grow and resource trash is also before this bundle, so, hard to say it is this or not.
Thanks for analyzing! I keep observing.
For now I’ve found the following. Somebody mentioned in this chat already, and now I can confirm: low process priority in Windows leads to catastrophic low performance of Syncthing 2. Me was running with the setting setLowPriority=true and if I override it with normal priority, it can boost performance x10. With corresponding CPU/disk usage. Maybe new DB just dont run well on starved resources, and it was fine for old DB.
When discovered it and adjusted priorities, things started moving like normal. No more 30 minutes waiting for GUI etc. Syncthing.exe is kind of king of my resource usage now, but okay for now - at least when it finished all the pending tasks, it started acting normally, processing change notifications and syncing files fast, as it should. Maybe priority affects latency, and it has much worse effect on SQLite then on LevelDB.
Still, I had to deny scans at all (set 90 days interval) because scans are pain now performance-wise. But it seems OK as inotify works fine.
And, the issue with DB growing is ongoing, +500mb on the largest folder DB in a few hours (my v2 db is already x7 size of v1 db). And maybe excess resource usage and requirements for larger datasets are due to this DB size increased and fragmented.
So just to repeat, “v2 is less efficient we know it”, but how much less at the moment, I just cant keep up with this , but will keep trying. Not sure if it is show stopper for me or not. DB size is getting close to unrealistic, but maybe I may live with it.
I will keep an eye on DB and tomorrow I will also try some DB debugs to find out why it gets so much degraded so fast. This is in assumption that things are fine. If not, it will also be clear in some close time, if DB will keep growing unlimited.
I appreciate your support and attention.
Maybe the effect is amplified by win11 big/little scheduling, where low priority run on different CPU cluster.
First I must state that Syncthing is a very good and useful software. I love it
I’ve added some info here: Generic scalability / memory overusage degradation · Issue #10357 · syncthing/syncthing · GitHub but in a nutshell:
- 38 shares with 42 remote devices, on a decent Synology (quad-core) with only 2GB of memory
- running on 2.0.7
- configured with the settings of « Low Resources » Configuration Tuning — Syncthing documentation (on 1.30.0, I didn’t have all the settings set, and it was running okay though, most probably because the max folder concurrency set at 1 was a good enough setting)
- launching all shares at startup make the load goes up to 50-60
- launching one share after the other keep the load ~4-8, with folder concurrency set at 1
But adding a share while another is running and load jumps from 4 to 8. So even if the share is « Waiting to sync » or « Waiting to scan », it’s using threads and CPUs, which it shouldn’t do (or at least not as much).
For instance:
- load average: 5.67, 8.48, 8.06 [IO: 4.37, 7.06, 6.66 CPU: 1.30, 1.41, 1.38]
- started one more share
- load average: 18.31, 13.19, 10.10 [IO: 16.57, 11.66, 8.62 CPU: 1.74, 1.52, 1.46]
- GUI is no more responsive (even changing the language takes one minute)
On the GUI side: it’s slow and not responsive (even on 1.30.0, but worse), but it’s clearly because all calls are piling up (can’t find the issue on github, but I know there was one, or perhaps a discussion on this very forum). Calls to API on « status, discovery, connections, error » are sent even if the previous calls are not finished. We should wait for the previous ones to end, then wait a few seconds, and then launch new calls. This most likely overloads the API and explains some slow config saves (as there may already been 50 pending calls to the API).
If the system itself is running low on memory and swapping everything will become a slog. Adding a folder, depending on what that means precisely, might mean it starts exchanging index info with other devices, which is not regulated by folder concurrency. I suspect this may just be a too large installation, mostly in terms of remote devices, for running on a system with 2GB of RAM (and whatever else is also going on on the system).
@dmih Trying to follow/summarise the discussions from yesterday, it was never clear to me if you have the database(s) on SSD or spinning disks? Apart from the obvious performance advantages of SSDs, I think this can also affect the WAL growth.
Effectively, the WAL grows for every write transaction as long as there is a read transaction open for an older database state. We mitigate this by avoiding long-running read transactions wherever possible. But, if you’re on spinning disks of course everything takes longer, and I can see even a “short” index read transaction could take a while if it’s competing with lots of incoming index writes on a spinning disk – especially if this is going on for multiple databases at a time.
Performance-wise, the key is win11 P/E cores and process priority. I am running 12700H and 990PRO NVME, so no hardware problems possible. However, with setLowPriority=true, win11 schedules this in a way that it is doing things at HDD-speed. E cores background win11 scheduling include additional penalties such as reduced timers resolution etc, which might be the key. With low priority initial scan physical IO is like 10mb/s, with normal - up to 300mb/s. And with normal priority it is doing fine. Worse than 1 but doing job 100% and correctly. No complains here.
As on DB grow, I NEVER care about WAL, im 20 years DBA skills and understand how journaling work. I talk only on .db file itself. Look at my folder.0010-kdrx2nvc.db. It was 1G after migration and it was working with low priority, then it went to 3.5G and things gone critically slow due to priority issue but now got fixed,
however, it is still growing, and 3.5GB was yesterday and it is 4.7GB now already. Total *.db files size is 7GB, compared to 800MB of v1 DB. And this size also naturally hurts overall resource usage as it seems, just queries to the file are more and more resourceful. And whats the expected ratio v1 to v2? It seems that as of this morning I am having x10 size already. NOT counting WAL files, I just exclude them.
Now I am waiting for the moment when I can either say “OK size settled” or “It seems it will just grow forever”.
Gotcha. Interesting note on the low priority setting. I wonder what’s the most appropriate thing there. On the one hand, running at background priority on the E cores is what I think we should do by default. On the other hand, having a default performance that’s slow as molasses is also not a fantastic experience.
I suspect database growth is incoming index updates from other devices, but hard to say from here.
yes, I’ve read since, googling, that you say, “node will exchange indexes anew”, and so maybe they are doing this. I have a couple of nodes. But I am not observing any traffic and it is now like 5 days passed in total already. Should have done a long ago if it is about that. Still waiting, and in spare time, I will use some DB explorer to look closer.
I’m not sure this is the general experience, e.g. I run with setLowPriority
on all my Windows systems with no real issues. However, what I also used to do before was to set the I/O priority of the syncthing.exe
process to low
(in Windows, not in Syncthing), and that did make the GUI extremely slow. For the record, Windows allows to set priorities separately for CPU, I/O, and memory. As far as I’m aware, setLowPriority
only affects the CPU priority, and the two others are still controlled by the OS only.
Windows with this new E/P code has even more settings now, such as additional ECO mode, which is not a priority but a bunch of additional hints, like latency penalties. Apps are downgraded to this mode automatically and process priority is one of the hints to this mechanics. Maybe overall score on my system is “yes this is ECO” with low priority.
You could check in the Task Manager if “efficiency mode” is applied to Syncthing. I’ve just looked it up on my Windows 11 system, and it is not.
Yes I checked these, sometimes applied, sometimes not. It is also related to feeds from ACPI hints on particular hardware, so things may greatly differ depending on (laptop, if any) firmware. For example, my Huawei D16 have some additional ACPI routines running to change overall performance characteristics of the system with 1S, 10S, and 60S idle intervals. Sometimes I use a software which emulates key presses to override this when I am rendering. Also, google for how Intel Thermal Framework work, this is also a 4±service resident beast which adjusts everything under the hood, etc.
Modern windows on advanced mobile is very complicated in this regard. But does not matter specifically here, It is anyway not more of a my problem, but an observation that confirms between me and you, that SOME priority adjustments (not matter what exactly) hurts v2 a lot more than v1. This is the outcome, as it seems.
Maybe it can/should be addressed and analyzed, maybe not. Or at least documented.
For me the same slowdown due to the low I/O priority was also visible on v1. I think it may be because the new database is more I/O heavy (and larger in terms of size), it makes the slow I/O access more pronounced.
Also, I am waiting for a version where I will be able to adjust cache size, because one of the theories is that syscalls got massively worse, and default cache is really small.
I would like to try to play with 1) maximum connections per folder, 2) cache size per connection, to return userland/kernel IO to reasonable level, and maybe this will mitigate the priority issue.
also maybe, 3) total number of SQLite connections for the app,
in other words, we need something like “total cache” knob, to try to tune these things. Now user/kernel IO is in excess. Working OK but not good.
ADD: it’s like looking into 5GB file with such a small cache just cannot be efficient. And with 16 of small caches in parallel is also not helping. Maybe it will run a lot better with 1 max connection with a larger single cache then.
ADD2: what is against this theory is that 2.0.6 with larger cache is also affected by performance degradation compared to 1, but it was more of not a “large” cache, but “many” of (still small) caches. Not the same thing.
Do you have P/E hardware? Just to compare things better.
… and anyway, it is obvious, that with smaller DBs of “normal” size, especially with lower file count (as there are less “transactions” there), priority does not matter much, so this side of topic is biased towards scalability, but not basic use-cases
UPD:
largest-file-sqlite-analyzer.log (45.6 KB) - this is largest file that keeps growing,
seems that background vacuum (if any) is not doing any progress here and would help,
what’s the schedule of full explicit vacuum by the way, if any?
UPD2: some other files have even worse page utilization. Average total I’d say 30% utilization in my case. Up to 50% I’d say okay (not good but not critical) but 30% seems a bit low.
UPD3: as a reference benchmark, vacuuming this DB on my PC is not fast. Using sqltools / open db / vacuum;, it took many minutes and it was at 20MB/s for ~5GB file, so probably explicit vacuum is not a good option…
UPD - My bad - never used SQLite - seems like page utilization is good here, I was looking into wrong metrics, data part only, but with indexes, utilization is tight, not a vacuum issue. It Just grows. So keep watching.
v0.2.7 reduced idle RAM-usage for my device with 64 folders to <300MB from 2GB before upgrade, thx
Maybe it’s time to upgrade my raspberry pi 1 with 512MB RAM to v2 soon, with 22 folders that probably would not have worked before v2.0.7