What is the difference between copier, puller, and hasher

bcauldwell · May 4, 2015, 5:55pm

This is a two part question.

- What are the responsibilities of the copier, puller, and hashers?

I can assume the hasher is just busy getting file hashes, and the copier is busy doing io, but what about the puller?

- What would be an ideal starting setup in general and also in number of copiers, pullers, and hashers for a application that will only consist of a local syncthing instance, and an instance of syncthing in a docker container on the host machine. So just a one to one two way syncing of files being worked on in real time.

Thank you in advance,

Benjamin

calmh · May 4, 2015, 10:07pm

These are different routines that take part in the file handling. “Copiers” and “pullers” are used when syncing files from the network. “Copiers” copy data blocks that we already have from one file to another (say, because some blocks in a file changed but not others, or a file was actually copied on the source) while “pullers” request missing blocks from the network - basically the number of pullers is the number of outstanding requests we have at any given time. There is seldom any reason to have more than one copier, while a larger number of pullers is necessary to get good network utilization. We default to one copier and 16 pullers. You might experiment with increasing the number of pullers for fast or high latency networks, perhaps decrease to lower resource utilization (memory, cpu) when syncing.

The “hashers” are used to calculate file hashes when we detect a change. The default is zero which actually means to use as many as there are CPU cores (i.e. 2, 4, 8, etc). Basically this is how many files we calculate hashes for in parallell. With slow disks, increasing the number of hashers will slow things down due to unnecessary seeking. With fast disks (SSDs or RAID) the CPU is the bottleneck and the default is optimal. The number of hashers is per folder, unless it’s at the default in which case we divide the number of CPU cores by the number of folders and then round up to one…

The defaults should be fine. If the machine seems overloaded during scanning, when a lot of files have changed, set the number of hashers to one or so. To decrease CPU utilization during pulling, perhaps decrease the number of pullers as the network latency is effectively zero and we don’t really need to pipeline a lot of requests.

You can also reduce CPU usage in general by setting the environment variable GOMAXPROCS to the maximum number of CPU cores Syncthing is allowed to use (this also affects the default for hashers=0 above). Of course, “reduce CPU usage” really just means “do the same amount of work, slower, over longer time” so adjust to taste.

Zillode · May 5, 2015, 6:43am

@bcauldwell, please add this to the wiki if you like. Thanks!