Syncing Algorithm

DBagrecha · August 31, 2023, 3:10pm

Hi everyone!

I was going through the syncing algorithm here and had a doubt. So, this documentation says that file blocking/chunking is done and hashes of these blocks are compared. But I had a doubt regarding this. Suppose the first block of the file is edited, then all the preceding blocks of the file also change and thus the hash of all blocks change. So, when comparing hashes all will differ from the previous version and all the blocks i.e. the whole file is transferred. This is very inefficient.

Can someone tell me if I am understanding the algorithm correctly or not?

martinleben · August 31, 2023, 4:05pm

That is not how I interpret the doc. Modifying data in a block only affects the hash of that particular block, not subsequent blocks.

If you insert or delete bytes on the other hand, I guess what you described is what happens.

I would be happy to have someone more knowledgeable confirm this.

Nummer378 · August 31, 2023, 6:07pm

There’s an additional “rolling hash” build into the code which attempts to detect shifted blocks. There’s even a metric on data.s.n that shows how much data transfer was saved by it (as seen by that data, it’s rarely relevant in practice).

DBagrecha · September 1, 2023, 3:04pm

Can you share the documentation on the “rolling hash” syncthing uses

Nummer378 · September 1, 2023, 3:15pm

There’s this:

https://docs.syncthing.net/users/tuning#tuning-for-high-performance

weakHashThresholdPct

Syncthing will by default look for rolling (weak) hash matches to detect data shifted in a file if a lot of data has changed in the file. If your use case doesn’t cause data to shift in a file, and if the files are large (movies, VM images, …) it is unnecessary to spend time checking weak hashes. Set the threshold to 101% to disable use of weak hashes.

Otherwise, the real source of thruth is always the code

This is a small optimization usually not relevant, because it only applies if data is inserted (not changed) into the beginning or middle of a file.