Noob question - incremental sync?

ionMike · September 30, 2014, 8:11am

Couldn’t find any info so I’ll ask it straight - is synchronization incremental? I.e. if my 2 GB Outlook file receives a new e-mail, does the whole 2 GB + 1 kb get synced? Thanks, iM

calmh · September 30, 2014, 8:22am

Depends on how the outlook file format works. Hopefully not.

ionMike · September 30, 2014, 8:24am

Thanks for the quick reply. My question doesn’t concern Outlook as such, it’s more general. Does Syncthing work along the same lines as Rsync, i.e. it only transfers the changes to any given file? iM

calmh · September 30, 2014, 8:26am

Yes, but it depends on how those changes are done. If data is appended to the file or changed in the middle of the file, it’s fully incremental. If stuff is added to the beginning of the file and everything is shuffled backwards, it’s not. But then that requires a full rewrite of the file to do the change as well, so it’s a pretty stupid file format if that’s the case.

ionMike · September 30, 2014, 8:31am

OK, good to know. Thanks! iM

hobarrera · October 2, 2014, 1:25pm

Have you given any thought to using something rsync-like for syncronizing large files in future?

calmh · October 3, 2014, 1:43pm

No, not really. This was obvious and simple and I haven’t heard a compelling use case or seen patches for anything else.

GeeGee · May 20, 2015, 9:08am

Is there any update on this topic?

I’m desperately looking for incremental sync.

canton7 · May 20, 2015, 9:24am

The answer is as it was: take a look at your file, and split it into blocks of 128kb. Now change the file. Only those blocks which have been altered will be synced.

This means that appends and in-file modifications will be synced in an incremental way. Changes which move data around across the whole file will not be synced in an incremental way.

GeeGee · May 20, 2015, 9:39am

That’s quiet disappointing.

Rsync and Dropbox can detect where in a file the change occurs and can insert it at the right position when syncing. That’s much more effective. Think on a TrueCrypt-Container that needs to be synced: the current algorith will have to sync the whole file in almost any case

canton7 · May 20, 2015, 9:41am

IIRC TrueCrypt contains completely change on any save? That’s kind of the point of a secure container. So not even Dropbox/Rsync would be able to cope…

calmh · May 20, 2015, 10:18am

That’s not how Truecrypt behaves. It’s a disk image like any other, except encrypted. If you write a 1 MB file to it, you’ll get 1 MB of changed data (plus some blocks of filesystem metadata etc) which we’ll transfer as efficiently as anything else.

I still see this as a theoretical thing, with only Photoshop files so far being mentioned as an example of something real world that would benefit from rolling checksums, and then only in corner cases.

And rolling checksums are by their nature something peer-to-peer, not really easily implemented for a cluster such as syncthing.

@GeeGee What’s the actual use case that you are desperately looking to solve?

GeeGee · May 29, 2015, 9:06am

As I moved from truecrypt to encfs due to the problem mentioned here, it now only applies to PST-files from Outlook. The rsync-algorithm is available to the public. Why isn’t syncthing adopting something that is working very well?

AudriusButkevicius · May 29, 2015, 9:22am

Because rsync uses rolling hashes, which are not cryptographically secure, which is one of the properties which syncthing requires in order to prevent spoofing/DoS. We’d need a cryptographically secure rolling hash, which most likely means inventing our own crypto, which is never a good thing.

Plus, rsync relies on variable block size, which would make it very hard for synchting to maintain an index. There is already a large discussion about this in another thread, I am sure if you search for rolling hash you’ll be able to find it.

calmh · May 29, 2015, 10:24am

According to what I can see, PST files use a block based database structure that should be very well suited to our sync algorithm. Again, do you see an actual problem here, in practice?

stevenroose · May 30, 2015, 10:24pm

What do you mean with

rsync uses rolling hashes, which are not cryptographically secure

When is a (rolling) hash function cryptographically secure?

AudriusButkevicius · May 30, 2015, 10:39pm

When it’s designed to be.

I am not aware of a cryptographically secure rolling hash. Though you could implement rsync like comparing, it would just be a largeamount of work.

stevenroose · May 30, 2015, 10:50pm

Well, ok. What I don’t get though, is why that is important for spoofing/DoS. Aren’t two connected syncthing devices supposed to trust each other?

AudriusButkevicius · May 30, 2015, 10:57pm

It’s (I guess) more todo with the likeness of collisions, given we trust a 64 byte hash for each 128kb worth of data, for potentially terabytes of it in total.

calmh · May 31, 2015, 5:08am

The point is more that it prevents working from a known index and distributing requests among peers. Syncing a file using a rolling checksum is an operation performed by two participants who both read through the file at the same time and reporting their findings. Also it’s not necessary as the current setup works perfectly fine for all so far mentioned use cases.