Encryption for remote syncthing device

saviodsouza · May 13, 2019, 5:37pm

Rclone has crypt remotes feature. Rclone is also written in Go Can that help implement crypt repositories for syncthing

cr0ssw1nd · October 27, 2019, 2:38am

Hey folks, I see this feature is highly requested for last 5 years and here is even a 2K$ bounty for this right now. I am also dreaming about optionally encrypted remotes, since this would allow me to replace Google Drive with Syncthing.

My guess is that certain effort has been made in this direction, but there appeared to be certain architectural problems. Am I right about this or there is just no clear way how to provide real security for the solution?

Right now I am thinking about:

investigate Syncthing code for possibility to create a PR with this feature myself, but I don’t know Go so this could take a lot of time;
start to build my own solution from scratch keeping security in mind, but this should be even longer way

The whole project is pretty awesome, especially the fact that it’s source is open.

P.S. Thanks @PhracturedBlue for sharing this Docker-based solution, I’m gonna to give it a shot.

PhracturedBlue · October 27, 2019, 3:30am

Resilio has a good reasonable solution to this problem. basically you choose whether to trust a server. A trusted server has the encryption key and will store files decrypted on disk. A non-trusted server keeps them encrypted. Network sync is always done using the encrypted file. Of course it isn’t open-source, and it uses the BT protocol…

I also want to make it crystal clear that my work-around is NOT a solution. Someone with root access to the server can absolutely get access to your files. It just makes it a bit harder to do.

darkdragon-001 · May 22, 2020, 10:58pm

I was thinking about encrypting blocks for deduplication.

One thing which came into my mind, was to use the hash of the unencrypted block. Encrypting (e.g. salt + symmetric key) this hash to form the block ID should make sure it doesn’t leak the hash of plaintext data. This way, two blocks with the same plaintext result in the same ID and can be deduplicated. A commonly used binary format would be IV+ciphertext. When IV is random, two clients can have different binary representations (when two clients independently introduce the same file at the same time, when they sync and store the blocks it stays the same). A malicious attacker could then request the same block from multiple clients and would receive multiple (still fairly limited with the number of clients in a usual setup) ciphertext for the same plaintext. Would this be an acceptable risk?

Adding this to the approaches introduced by @generalmanager in the the locked issue on Github:

But it could be solved with deterministic encryption (the same input always creates the same output for one key). If the same plaintext always produces the same ciphertext, the untrusted nodes can compare the hashes of ciphertext blocks, so they don’t store files multiple times , if those were added on different trusted machines while offline. And trusted hosts can compare a list of hashes of encrypted blocks to their own list of hashes of encrypted blocks , which means they don’t waste traffic on files they already have. (Note: I used the term deterministic encryption a bit misleading here. AES is for example deterministic, but made non-deterministic by using different IVs/nonces.)

Which algorithm would you use for this?

AES without IV/nonce isn’t very secure, is it?
Suggestion 3. below?

3. Nearly everything is the same as in 2. but instead of a completely random nonce we use the (first 96 bits of the) hash of the unencrypted block (plus a shared secret to protect against file confirmation and similar attacks) as the nonce. This way the ciphertext is always the same for identical plaintext blocks, but it leads us to the barren lands of not well researched crypto and doesn’t sound like a good idea: initialization vector - Is it safe to use file's hash as IV? - Cryptography Stack Exchange

calmh · May 23, 2020, 8:49am

Making sure identical blocks from different files encrypt the same only saves a little in transfer costs. On the other hand it increases complexity and reduces safety by leaking information to an attacker. I don’t think it’s worth that tradeoff.

AudriusButkevicius · May 23, 2020, 10:09am

I think its even an attack vector to some extent, because if you are able to put plaintext on a folder and make it encrypted, and the iv/nonce is somehow deterministic (for the purpose of block reuse it has to be), you could over time recover the key.

So things like reusing content from old file, reusing content from other files, rename detection etc are all not possible in encrypted folders.

darkdragon-001 · May 23, 2020, 11:08am

@calmh I only care about deduplication to solve the efficient move/rename feature. With deduplication in place it is solved implicitly.

When storing the full path (or unique parent folder id) in addition to the file name, the meta data for two unknown files with identical content is different within the repository. The file data for two identical files is stored only once anyways.

@AudriusButkevicius Your doubts about deterministic IVs are reflected in the quoted link for 3:

You obviously lose semantic security when you use deterministic encryption. This means an attacker can tell if two files are identical.

So when an attacker knows that a known file does exist in the repository, he can generate the deterministic IV/hash/ciphertext with a guessed password and perform dictionary/brute-force attacks for the secret. When using a salt, the effective secret contains a lot of entropy.

Tradeoffs always have to be made, that’s why a threat model is so important! So I was wondering which (or none) of two outlined solutions (availability of multiple different BlockCipher for same BlockId; deterministic encryption) is acceptable with your threat model in mind.

Another approach to rename/move detection (implicit when content equals) would be to not detect it at all. When using FUSE, edit/move actions are explicit. Synchronization would then exchange some sort of journal since the last successful sync. In a distributed context, it might be difficult to identify and refer to a necessary common base state (similar to git).

calmh · May 23, 2020, 11:40am

There will be no efficient move/rename with encrypted files. Files have individual encryption keys. This is a feature as it prevents tracking data between files and correlating copies/moves that might have happened.

darkdragon-001 · May 23, 2020, 1:22pm

@calmh What is the problem with my proposed solution since it prevents tracking and correlation if I am not mistaken!

AudriusButkevicius · May 23, 2020, 1:45pm

It’s completely not clear what you suggestion means in the context of syncthing and it’s protocol.

Rename detection and deduplication can still happen for peers that have the decryption key, but I don’t think it can happen on encrypted peers.

calmh · May 23, 2020, 2:17pm

Yeah it’s unclear to me what you mean. Deduplication of blocks happens on the receiving end based on the block hash. In the encrypted case the block “hash” is in fact just an opaque token (regular block hash encrypted with the file key). Being able to dedup on the receiving side means having identical block hashes / tokens for identical blocks in different files, which lets the attacker draw conclusions about your data that they shouldn’t be able to.

darkdragon-001 · May 23, 2020, 2:23pm

Which part exactly is not clear?

calmh · May 23, 2020, 2:25pm

That part is clear. It’s also the part I’ve said I don’t want, as it allows the other side to correlate blocks between files. If I sync my patient registry, it’s unexpected if the other party can draw the conclusion that five out of twenty patients have the same diagnosis. Or similarly if I’m syncing user home dirs and you can see that four other users have the same file with a regime-critical poster that you just added yourself.

The

sounded like you had some clever solution to this, which isn’t clear what it was.

darkdragon-001 · May 23, 2020, 2:50pm

Who is the other party? I expect an attacker to not being able to draw the conclusion while the other machine I am syncing with has the key anyways.

I don’t know how the Syncthing protocol works, but I would suggest something like this (meta data contains block IDs):

Create meta data for local files.
Get meta data from client to sync with.
Compare meta data.
Download blocks which don’t exist locally.

So an outside attacker doesn’t see which blocks are re-used so he doesn’t know which files are deduplicated.

calmh · May 23, 2020, 2:54pm

There are of course different attacker scenarios. At minimum you have to assume they control the encrypted device, or there would be no reason to encrypt the data we send to it. That means they see any metadata we send it. Or put another way, if Syncthing can see that two files are the same in order to apply rename optimization, then so can the attacker who controls that Syncthing instance. If we want to make it impossible for an attacker to see that two different files contain they same data then by definition it must be impossible for Syncthing to do rename/copy optimizations.

darkdragon-001 · May 23, 2020, 3:10pm

Untrusted nodes must not be able to read the meta data on how to assemble files from blocks. Otherwise file size attacks can be performed (and in turn an attacker can find out that someone has stored the regime-critical poster).

Furthermore, when untrusted nodes can perform actions they can also delete files from all other clients. Further, if they had the ability to resolve conflicts, they could force clients to replace files with garbage (which has the same effect as a deletion). Therefore, they must be dumb (only storing and serving data). When they are dumb, there is no need to access meta data.

calmh · May 23, 2020, 3:18pm

I suggest you read the design docs on the crypto proposal to see what we guard against and what we don’t. File sizes are not one of the protected things, although there is some variation due to padding.