Performance issues under 1.4.0 rc4

imsodin · February 18, 2020, 9:44am

Why though, did you reset the database? If it needs to rehash everything on upgrade something is wrong, that shouldn’t happen. I didn’t see any indication of that happening in the log you sent me previously either.

Can you please take and share a cpu profile: Profiling — Syncthing documentation

terry · February 18, 2020, 9:52am

When I downgraded to 1.3.4 it wouldn’t talk to the upgraded indexes so I renamed the folder and let it reindex. Maybe i’m using the word hashing out of context. I mean scanning / indexing.

After a day or so I had almost all the jobs reindexed and took it all back to 1.4. The upgrade took around 20 minutes. It started at concurrency 4, but the indexing was far too slow with some jobs reporting 4 days to scan (with barely any disk access), so I set it to -1 and got all the jobs scanning, but this results in St getting overloaded.

terry · February 18, 2020, 9:57am

I’ve killed the current sync to enable debugging and here is the profile. Do you want one when it’s all locked up or at random times?

syncthing-cpu-windows-amd64-v1.4.0-rc.5-095450.pprof (120.1 KB)

terry · February 18, 2020, 10:12am

Another thing i’ve started to see, and don’t know if this is related or is a bug in itself, is the jobs say that are up to date, yet the local state differs to the global state. If I try a rescan on that job it comes straight back to being up to date.

it’s not all the jobs,

and not related to large jobs…

it’s just some folders, the majority are correct…

imsodin · February 18, 2020, 11:54am

Thanks for the cpu profile: Almost literally all accounted cpu time is spent generating folder summaries. Cpu usage should go down drastically if you close the web UI/Synctrayzor. I will file a PR shortly addressing this problem. As to why it seems worse than with 1.3.4 I have no idea.

I can’t say for sure, but the local vs global state discrepancy looks like a different issue. I’ll open a separate topic.

imsodin · February 18, 2020, 12:02pm

A post was split to a new topic: Global state doesn’t match local state on up to date folder

terry · February 19, 2020, 10:08am

This is now revolved, see Global state doesn't match local state on up to date folder

terry · February 20, 2020, 10:34am

Not out of the woods. PC was dead in the water this morning, low CPU %, St on 3Gb ram, disk queue back in 60-70. No matter what I clicked on to open it, nothing happened, even outlook wasn’t syncing emails. Wasn’t able to get a cpu profile. After an hour trying I killed St and literally everything loaded onto the screen that I had clicked on.

Syncthing was running, I would see in the resource monitor / disk that files were being indexed. But why would the computer be so unresponsive after a period of time?

Concurrency was set to -1 as I still have not completed a full scan of the folders since 1.4 came out.

Catfriend1 · February 20, 2020, 12:23pm

@terry Did Syncthing use a lot of swap memory ? Virtual memory makes pc’s very slow especially on Windows where we have to deal with gui responsiveness.

AudriusButkevicius · February 20, 2020, 1:08pm

To be honest, I think you should stop pointing fingers at syncthing and start pointing fingers at your drive.

Sure, syncthing performs disk operations, but the fact that the rest of your PC locks up is usually a sign of a slow drive with high seek times. OS is usually smart enough to schedule IO reasonably between different processes, but it seems the drive is so slow that no sensible scheduling is possible.

Suggest you run io benchmarks on your drive and see if you are seeing sensible numbers, as I suspect you won’t be.

imsodin · February 20, 2020, 1:08pm

Where do you store the database - spinning disk or SSD?
The timing makes me suspect the blocks GC as a culprit. On the other hand I don’t see why it would lock up your system. I checked the expected memory usage for your 1.5 million files with bloom package - github.com/willf/bloom - Go Packages and got less than 2MB as a result.
If you do the following, be prepared to kill Syncthing - it will deliberately run blocks GC continuously to check if this is the problem: Can you run Syncthing with environment variable STGCBLOCKSEVERY=1s and see whether it you get the same symptoms more or less immediately.

Thing is the issue is present on 1.4.0 only, 1.3.4 works fine.

AudriusButkevicius · February 20, 2020, 1:20pm

I am still sceptical.

I think it’s just a dying drive that happens to be more exposed/visible because we do wider scans every now and then. There is nothing fundamentally wrong it what we do, if there was there would be more people talking about it.

Again, run IO benchmarks, see if the numbers line up with what the vendor advertises.

terry · February 20, 2020, 5:53pm

I’ve been out all day. To catch up, I have 16Gb ram, there is a small amount of swapping, but the OS, database and swapfile are on an SSD

I declared that the data drives are USB, but as mentioned I have never had St lock up this bad before. I know the io is poor, but that would in no way stop the PC from functioning.

i’m currently running concurrency at 10, and disk queue is 11. I have had it at -1 and with everything scanning would plod along at around 16 for several hours then something happens and the queue goes to 60.

The drive is not a year old, it’s WD 8Tb in a powered enclosure, so 3.5", there’s nothing to indicate any issues. It’s also USB3. Disk read io is

I will kill St, set concurrency to -1 and set the STG and see how it goes

terry · February 20, 2020, 6:19pm

STGCBLOCKSEVERY hasn’t made any difference, it’s not made it lag or queue go up. I will kill, remove the setting and try again and see if I can record the resource monitor somehow to see what happens

terry · February 21, 2020, 10:31am

2.avi (11.5 MB)

Here’s an edited review of last nights sync. Key events (im using the time on the original video, not the time on the file).

9:04:38 Syncthings uptime jumps from 6h 8m to 14h 35m

I scroll the video towards the end but watch the disk que

9:21:27 The disk que comes from high to ‘normal’

9:22:16 Indexes are being read / written, disk que goes up

9:23:52 Indexes activity increases, Disk IO ramps up to 101MB/sec (top of resource monitor) que increasing

9:24:13 Synctrazor gui no longer responding

9:25:00 Its just indexing

At this point im trying to run a cpu profile, but the system is starting to go slow so I save the video.

I have reapplied the STGCBLOCKSEVERY variable and will see if that helps. It seems that when it indexes in chunks is where it struggles, but only after a period of time ( memory related, high mem - index issue?). If the variable is updating the index more often, then in theory I should not get locked up.

imsodin · February 21, 2020, 11:09am

Please don’t!
As I wrote before, that was only an experiment to forcefully trigger GC to check if that immediately locks freezes Syncthing. Running GC continuously is not helping/mitigating/doing anything good.

Please share the logs from that time and ideally include the folder side into screenshots/-casts (starting around 22.16 the download rate becomes really high, so I’d expect folders are syncing the two files added to local state).

terry · February 21, 2020, 11:17am

Stopped and removed.

Can you advise what logs you need

imsodin · February 21, 2020, 11:44am

I should use “the log” not “logs”. Unless otherwise stated, the required log is always syncthings own log. If they get archived after some time (they are on Windows i think) then the file with entries from the time in question. If unclear which, just take all.

terry · February 21, 2020, 11:49am

They’ve been overwritten as I have restarted a few times. I will pull them next time it happens.

AudriusButkevicius · February 21, 2020, 3:38pm

Task manager shows seek times/duration which is more interesting metric.

Hard drive issues usually manifest not in dropped throughput but in longer seek times.