How well does the block re-use work for other/existing files?

Hello All, I’m really impressed with what I’m reading and seeing with Syncthing. I’ve got a use-case where we move a large number of fairly large files over a high-latency, and expensive link from a remote location to a central location.

Since these files are measuring repeating physical phenomenon, it stands to reason that over time the population on the centralized server location will contain a significant number of blocks of data that could be used to re-create new files that need to be transferred.

I saw mention elsewhere in this forum that syncthing is able to re-use blocks for other files, so I’m curious to know if theoretically we could see a significant reduction in our data traffic with Syncthing - if it could say “grab this block from file x, this block from file y, this block from file z, and only send this delta”.

I know this is very generalized but I’m curious to know if conceptually this is how I could expect Syncthing to work. If there are constraints that I should know of, for example that the files would have to be in the same folder as the target destination, or similar please share.

Thanks and Best Regards, Jeff

1 Like

The files can be in different folders (syncthing folders). Yet the blocks have boundaries, so if you append at the front of the file that as a result shifts the blocks but not the boundary, hence from syncthings perspective all blocks change. If you are syncing block devices, isos, it shouldn’t be a problem.

1 Like

It depends on the structure of your data, as @AudriusButkevicius mentioned the matched blocks of data have to be within the block boundaries. If you are hooping new files will have matching blocks it is unlikely unless they all start with the same information, have a set structure and field size etc. Also if there is any kind of compression, jpg, mp3, etc. You are even less likely the same or neart to same content will have similar blocks.

Don’t forget that there will be a bandwidth overhead for sending/receiving indexes as well. The more data Syncthing handles, the bigger the indexes become so this should be something you factor in to your considerations.

It shouldn’t take long to set up a pair of test instances to see how much data you’re using and how much is saved and it’ll give you a feel for how Syncthing works, too.

2 Likes

Thanks guys for the informative answer. I will definitely give it a shot and try it out, and I appreciate you guys explaining how the limitation of data & block alignment could disrupt the re-use. It all makes sense.

I’m impressed with what a strong and active community this is - thanks again for the guidance.

1 Like

Do report back - it’s nice to hear if things work out, and get an idea of what can be improved.