Deduplicated remote or encrypted Folder (another crazy idea)

So I had another crazy idea. We have a specific case where we are sharing large quantities of files across multiple sites globally and are considering setting up some sites to act effectively as relays by having them share some folders that they otherwise wouldn’t need access to. (This is not a true “relay”, but for this purpose, it works by making parts available on the other site.)

These shares have lots of duplication among the folder and therefore would benefit quite substantially from deduplication. I’m curious what you guys think of, and whether it’s been discussed whether encrypted remote sites (that don’t need to have files complete and readable) could be “deduplicated” from the perspective that they just need to make sure they have encrypted copies of all the blocks, but don’t actually need to have the complete files (because they are not useful until decrypted, at which point they could be reassembled.

Anyway, it was just a thought. I fully expect this idea to get shot down hard… I think the change is probably too substantial to the framework of the encrypted folder and would probably break the offline decryption process completely (or make that impossible.)

Just interested in a bit of discussion if you guys have thought about it before.

1 Like

The security design of the untrusted device encryption prevents this. Security-wise not doing this is important, because having the same encrypted blocks across different files enables content correlation, possibly leaking file contents. It’s also in general a security nightmare to have the same blocks encrypting to the same block, as it requires fixed keys and nonces etc.

In general, compression and encryption don’t play nice together (CRIME and BREACH immediatly come to mind), so I would advise against any such endeavours.

If you eliminate security designs from the equation (i.e. lets go back to an unencrypted solution) what you describe is certainly possible, but probably not within the scope of syncthing.

The functionality you’re looking for already exists at the filesystem level. BTRFS has an extension that finds an deduplicated blocks within files. Full-file deduplication (reflinking) is available on almost any filesystem. I imagine that ZFS and various other modern filesystems offer similar functionality. Basically everything with Copy-on-Write technology can do this.

3 Likes