Does the backoff algorithm take note of file size?

jpjp · May 23, 2014, 7:14pm

syncthing seems to use quite a bit of cpu when watching a folder which is downloading big files. Does it take notice of file size before content?

calmh · May 24, 2014, 6:10am

It looks at modification time primarily, to see if a file seems to have changed. If it has, the change is fed via the suppression algorithm to decide if the file should be rehashed or if we should ignore the change for the time being.

The suppression algorithm calculates the average “change bandwidth” of a file, that is the file size divided by the time between changes. A 1 MiB file that changes every 60 seconds gets a bandwidth of 17 KiB/s. If the bandwidth exceeds the configured “max change bandwidth”, the file is not rehashed and the change is ignored. If the bandwidth is still under the limit, the file is rehashed and the average bandwidth updated.

This is still a quite blunt instrument. To start with, it’s per file so syncthing will gladly rehash a thousand 1 MiB files that are under the limit, while a single 1 GiB file would be suppressed. The original use case was to avoid constantly rehashing VM images.

jpjp · May 24, 2014, 8:03am

In my case I was downloading eight big files (so growing in size, with constantly updated mtime), and syncthing was using a lot of cpu: the fan was going crazy. For this case I’d have preferred that syncthing wait until the file stopped growing (not ideal) or serialise scanning changed files (which I guess it’s not doing, given the cpu usage).

calmh · May 24, 2014, 4:27pm

It’s serialized within a repo, so CPU usage should have been limited to a single core (approximately). And it should have backed off. But the default max change bandwidth right now is 10.000 KiB/s, so a file can be ~585 MiB and still be rehashed every minute. If you have many such files, it gets heavy. You might want to tune that setting, and I guess something a bit smarter in general wouldn’t hurt. Per-repo bw instead of per-file?

Edit: it’s worse than that, CPU usage wise. What happens in your case would be something along the following lines:

The change is detected, the file gets rehashed. That uses as much CPU as we can on a single thread.
The change is published to other nodes.
The other nodes start pulling the file. Depending on much data they need to copy and how fast network you have that can use a fair amount of CPU as well (compression + encryption). But by this point, the file has changed again. So when they have pulled the changed blocks, the file fails the hash check.
Goto 3…
After 60 seconds, goto 1…

Once the file gets tagged as changing too often all this stops, but that requires reaching the limit.

jpjp · May 24, 2014, 6:03pm

Thanks for the explanations

So in this specific case, it could have made more sense to notice that the file was growing, and continue from where it left off, rather than rehashing the whole time. Then rehash at the end before the rename.

Is the same temporary file used the whole time on the receiver side?

jpjp · June 1, 2014, 8:58pm

Currently I have a new directory syncing which contains a single large file. syncthing is using 200% cpu plus kernel_task is at 70%

jedie · June 2, 2014, 6:57am

This happens only the first file, while syncthing adds the file to the index.

jpjp · June 2, 2014, 7:29am

okay, I thought it was because of syncing to two nodes.

calmh · June 4, 2014, 2:49pm

Could be either. Hashing files uses a lot of CPU (the hashing). Syncing files at high bitrates uses a lot of CPU too (compression + encryption).

bernarddt · August 29, 2014, 8:38am

I have a Linux server with a Celeron 3Ghz processor. Syncthing is killing the poor server (downloading another LAN node repo).

What if we would use “cpulimit”? Would that cause trouble if it delay the process task for to long.

Would it help to set the “scan” interval to something larger than 60seconds (like 300 seconds) to help with the “After 60 seconds, goto 1…”. Lastly is it: “After 60 seconds abort, goto 1…” or After 60 sec start goto 1. in parallel process…" or After done and t > 60 seconds, goto 1…"

calmh · August 29, 2014, 11:51am

No issues here. Limit the CPU usage however you like, the only ill effect is that things take longer. Or just let it sync and you should be fine thereafter.

I don’t really get that, but yes by all means do increase the scan interval for large repositories (high number of files; scanning a few large files costs nothing unless they’ve changed). Especially if this is someplace that is mostly the receiving side, i.e. you don’t expect there to be many updates from that direction. I run syncthing with 86400 s (24 hours) rescan interval in a few places, because it’s not supposed to change very often there.

bernarddt · August 29, 2014, 12:16pm

[quote=“calmh, post:11, topic:243”] I don’t really get that[/quote]

Sorry my bad! I was cryptographically asking:

Does the code abort the current running scan/hashing/syncing opperation when it hit that rescan time interval.
Does the code keep running the current opperation but start a new scan/hashing/syncing operation in parrallel when it hit the rescan time interval.
Does it complete the current opperation and then immediately start a rescan if it has passed the time interval duration already.
Does it complete the opperation and then only start counting 60 seconds again.

I believe you would understand that some of these cases would cause trouble if you have some opperation that is taking very long and the system keeps on starting concurrent processes or adding to the queue of processes to complete. (I believe 3 or 4 is something you would be using).

PS. I only found this open source sync project yesterday and it is amazing what you have accomplished!

calmh · August 29, 2014, 2:57pm

Ah. Your point 4 is correct, the next rescan is started rescan_seconds after the previous scan completed.

bernarddt · August 29, 2014, 6:21pm

Good! So having a very slow CPU or Bandwidth would not be a problem.