Encrypted nodes (please test)

Lennix · August 1, 2015, 1:56am

Copied from bountysource (https://www.bountysource.com/issues/1474343-support-for-file-encryption-e-g-non-trusted-servers). Check out the source code here: https://github.com/Lennix/syncthing

Hey guys,

I’ve been working on this about a week now and it’s working quite well. Let me tell you how to set it up:

#Setup

On your client you add a new folder like usually, but check the “Encrypt” box on the lower left. A random generated 32 byte (256bit) passphrase will appear and you should write it down or save it somewhere safe. You can change that passphrase to use your own (please only 32 bytes) or enter your old passphrase to recover a folder. On the receiver (I call it the eNode) you also add a new folder, but now you check the “Encrypted” button.

After adding the folders, the synchronisation should start and you should see your encrypted files on the eNode. If it doesn’t start directly, you can restart syncthing to let it reload the config, that seems to be the problem most of the time.

I encourage you to test this. Please backup your old database and config and add new folders instead of changing the config of your old ones.

And now some technical stuff:

#Encryption details:

The encryption used is AES-256 (CFB-mode) with a pbkdf2 derived key using the passphrase and the block hash as salt. Files are encrypted directly on the sending client and no encryption information is shared with the eNode.

#Changes to syncthing / technical details:

The client sends the folder index containing the (SHA256) hashes of the files to the eNode. The eNode then requests the files from the client, which sends them after encryption. At the moment no hashes of the encrypted files are shared, so we’re relying on the underlying protocol for checksumming. On an index update we can request the blocks changed since we have the hashes of the cleartext files. We can also copy already received blocks from other files. The eNode itself does not scan the file system since it will generate different hashes.

#Limitations / ToDo:

Currently folder and file names are still cleartext on the eNode.
Sending files from multiple clients is experiemental right now. But it’s paramount that you connect the encrypting client ONLY to the enode. Not to another encrypting client. (You could do that by adding the same folder another time without the “Encrypt” option set.
The key salt used is based on the block.hash. That has 2 drawbacks: 1. it’s not truely random 2. if both databases get destroyed the files are not recoverable.

#Some history: I first started using RSA OAEP encrypting using the public/private keys generated by syncthing. Due to that encrypted blocks were bigger than clear text blocks giving me a big headache. After all that was resolved I had to realize that RSA was way to slow for this kind of application. On an i7 4770k with 100% CPU utilization I had about 50-100kB/s decrypting the files. After that I had to change my plans and stick to a symmetric chiffre and since AES-256 is well known and implemted I chose that. Using a salted key I could also drop the IV from the encryption resulting in same sized blocks. That way I could re-enable a lot of features syncthing already had.

And here we are. Please look at my code critically. I haven’t written tests yet. Maybe you have feedback on my implementation or are looking for another feature.

If you have questions you can also hit me up on twitter: @pknede

Lennix · August 1, 2015, 2:00am

You can get the 64-bit Windows (7) binary here: https://github.com/Lennix/syncthing/releases/download/v0.11.16-encryption/syncthing.exe

If you don’t trust me or have another OS, you have to compile it yourself. Please follow the guidelines in the docs: (http://docs.syncthing.net/dev/building.html)

AudriusButkevicius · August 1, 2015, 9:35am

Hi,

So it’s all nice, but it’s missing a few important components.

Verification, keeping encrypted indexes etc, verifying the received block, etc. I can easily DoS this advertising that I have all the data, and then giving you zero’ed blocks, because you don’t verify them on receipt.
There is no scanning, so no changes are picked up.
Verification that someone who has sent the index actually has the passphrase to be able to perform modifications to the data set. Though just verifying whether the guy has a passphrase is not enough as it could be A (enc) <-> B <-> C (enc), so the fact that B sent an index update to A could mean that it’s being relayed to A from C and hence A should accept it, so this somehow needs to be accounted for.
Maybe, you could get a way without the need to specify that folder is encrypted, that perhaps should be part of ClusterConfig and automatically derived by a node who doesn’t have the key.

Also throwing some links in if you haven’t seen them before:

calmh · August 1, 2015, 10:46am

Great! I love that someone actually started hacking on this. This seems like a good start! I’ll echo some of Audrius’ concerns and add some of my own:

I think we really do need to encrypt the file names. We might get away with not hiding the file sizes, possibly.
The encrypted node does need a way to verify the data, ideally by being given the hash of the encrypted data instead/also.

Perhaps some of the changes in the above linked proposal could be integrated with your work?

AudriusButkevicius · August 1, 2015, 11:17am

Also, I am not sure if it’s a problem, but just mentioning it, in case you think it might be. It seems that you make your assumption that all blocks are of size protocol.BlockSize, though that does not apply to the last block, so the ciphertext might not be of the length you expect it to be, and might potentially might not be a multiple of the plain text, due ciphertexts being padded match some block size.

Regardless, this is an awesome step forward.

calmh · August 1, 2015, 11:26am

I looked at this. Does this have an advantage compared to just running PBKDF2 once over the passphrase to get a key and using the block hash as the IV directly?

Lennix · August 1, 2015, 12:07pm

I agree with you on verification. I don’t get the point about DoS, since communication of blocks starts after authentication. Where’s the benefit in DoS’ing an authenticated node?
This is only necesary to detect 3rd party changes or bit rod on the encrypted data and I agree.
I suppose this is nice to have.
That sounds like a good idea. I also thought of using the protocol.Options that’s available in almost every message.

(From the next post)

AES in CFB mode does not require padding.

Lennix · August 1, 2015, 12:23pm

Yes that’s on my ToDo, but from what I see, folders are handled as files within syncthing. That means (and I think I already read that concern somewhere) that encrypting Folder A might result in Cipher A but encrypting Folder A/File B does not result in Cipher A/Cipher B. So some additional handling needs to be done to accommodate that.
Also on my list. I wanted to start with the least destructive changes, since I don’t know what your policies on protocol changes are. It’s easy to add the hash of the encrypted data, but requires changes to the DB structure and procotol.

(From the next post)

PBKDF2 requires a salt, so you need to supply something anyway. Using the hash is a workaround at the moment. The best solution would be to generate a random salt and save it somewhere. A practical solution is something derived from the file or block, something the client knows regardless. Maybe the cleartext filename. (Which would prevent the eNode from deduplication)

According to the golang docs a IV is not necesarry if the key is different for each cipher, which it is using a salted passphrase. I first worked with a random IV prefixed to the block. But that would change the block size. Given some changes to the protocol, it’s easy to handle that. We could also use AES in CBC mode then, hiding the original block and file size.

AudriusButkevicius · August 1, 2015, 12:41pm

Agree that DoS by an authenticated node makes no sense, but you can image a case where someone has a very high rescan interval, and the fact that the file is now modified is never picked up, hence when people ask for block X with hash Y, they get block X with hash Z, which corrupts the file, even if its not intentional. We currently rely on this as a mechanism to deal with files which are changing as we sync, hence it’s more or less necessary to have it working with stability.
Well scanning is required in general to pick up changes to files, so that the could be synced across.
Yeah you can do that.

AudriusButkevicius · August 1, 2015, 12:48pm

=1 So folders are just files with a flag bit set. Folders do not contain files as such, so its flat and non-hierarchical. Both are stored separately, so Folder A/File A will have cipher A, where Folder A can have it’s own cipher. What you should end up, is base64’ing or base32’ing the file/folder names and having a flat folder structure, such that:

Folder A/
Folder A/File A

becomes something like

ASDAXAQSDSDQ
XHASEOQDHOQWEIODKQOS

Then when bootstraping/decrypting, you’d probably want order stuff by length, so that ASDAXAQSDSDQ is decrypted before XHASEOQDHOQWEIODKQOS, so that when we decrypt the file, it already has a directory to go into.

=2 Feel free to modify anything you need, as this will require a major release anyway, so any backwards incompatible changes are cool if that makes the code cleaner.

(next post) =1 If you read the links, it talks about hmacs and salts, and how it could be dealt with.

uok · August 1, 2015, 1:10pm

@Lennix Thanks for contributing and working on this exciting feature!

calmh · August 1, 2015, 2:56pm

I’m mostly thinking that PBKDF2 is an intentionally computationally expensive operation and we don’t gain anything security wise by repeating it on each request.

calmh · August 1, 2015, 2:59pm

Yeah, there’s no need to represent directories on disk on the encrypted device. In the index is sufficient.

Lennix · August 1, 2015, 3:04pm

1+2. Implementing scanning on the eNode is quite difficult since we are only able to handle the encrypted hashes. The only benefit I see is to determine changes on the eNode and ask clients for the blocks to revert them (since you don’t want to sent changed files from the eNode). The normal synchronisation is done with the cleartext hashes, even synchronizing 2 clients at the same time. (at least in my testing)

The blocks hashes are checked by the client after decryption, to ensure file integrity.

(From the next post):

The problem here is that the eNode would not know in which folder the file belongs. I guess the best way would be to split the path and encrypt each part.

I guess the hmacs would be replaced by SHA256 hashes of the encrypted blocks or?

Lennix · August 1, 2015, 3:06pm

That would also be an acceptable solution I guess. Is the right order during first sync and rebuild ensured?

Meaning that syncthing first creates the folders and then puts the files in?

AudriusButkevicius · August 1, 2015, 4:16pm

Right, if I understand correctly currently encfolder connects to a rwfolder? In which case my argument makes no sense, because I just misunderstood.

I guess it would be nice to have both ciphertext hashes and plaintext hashes, this way we could verify that nobody has modified the encrypted files, or if someonehas, repull them from some source.

calmh · August 2, 2015, 6:18am

Yeah, currently Syncthing assumes that a directory will be created by it’s corresponding directory entry in the index before it gets to process files in that directory. But you could simply skip handling any directory entries at all and when creating a file ensure that the necessary directory exists beforehand.

Lennix · August 12, 2015, 2:24am

So, finally beeing back from a week long business trip, I was able to implement filename encryption.

It now encrypts the filenames using a block cipher (AES CBC) with the passphrase as key and the sha256ed filename as IV. The IV is sent with the cipertext as recommended. Since we are using a random IV, the filenames and folders are random and given the block cipher you can’t guess the original length.

The filenames are encrypted on Model.sendIndexTo and Model.RequestGlobal. That way the eNode only knows about the encrypted filenames (and directories). They are decrypted on Model.Request to be able to serve the correct file and on Model.Index(Update) to update the index with cleartext filenames.

I tested both uploading the files and recovering them on desaster, both worked flawlessly.

Deduplication (using existing blocks from other files) still works.

I have a raspberry pi lying around which I will try to use to test the performance of encryption on low-performance ARM devices. Should have time to try it this weekend.

bademux · August 12, 2015, 3:38pm

Exactly the way I planning to use encryption: Simple to setup low-power device(s) for backing up my data. Thanks.

calmh · August 12, 2015, 4:53pm

File name encryption seems mostly sane; possibly we need to add some hierarchy to it but that’s trivial. However, I can check if you have a given file or not by comparing the filename hashes. This is more difficult as I don’t know the directory prefix, but may still be an issue?

I’m adding some more high level notes here from the pull request, as there more eyes and discussion here;

It’d be ideal if the encrypted side didn’t need to know it was encrypted. In fact, what happens if we don’t tell it and there’s a mismatch between the two sides in their assumption about the encryptedness?
There’s too much code duplication here between the rw and enc folder types. You don’t necessarily need to fix that, but before merging this I think we need to do some refactoring to break out common parts into something more sensible, so be prepared for that to happen. It’s been on the todo list for a while but this makes it more urgent.
Do symlinks survive the passage? Are they encrypted?
PBKDF per crypto operation is unnecessarily expensive; some other way here would be nice - use an IV and stash it somewhere?
This doesn’t encrypt the file content hashes, which means I can check if you have a given file by comparing the hashes with an unencrypted copy of that file.
This doesn’t hide file modification times and so on. Do we need this? I don’t see a reason we need to expose it, and the less data we give an attacker the better?
Do we want data verification on the encrypted devices? Spontaneously, I do, but I can’t say it’s super essential…