Latest encrypted devices proposal, what & how

I prefer the “Trusted” and “Untrusted” terminology. Don’t use plain & encrypted, since then it makes it seem that the “plain” is not encrypted in transit to newbies.

I’m not sure I follow the implementation approach for untrusted nodes with a password. So the trusted node would add an untrusted device ID to an existing folder and then add an encrypted password aka untrusted encryption key?

Please make the untrusted encryption password VIEWABLE from the trusted side. In case one forgets it, you don’t want to have to reset it and update all the untrusted nodes. Resilio allows for viewing all the keys from a trusted side.

2 Likes

I’m trying to better understand how integrity of the data is maintained on untrusted devices, especially if untrusted devices are used to seed trusted devices. I’m thinking of a worst case scenario where the only copy I’m left with is that on a untrusted device. Based on this copy (and knowing the key) I’d like to be able to verify I have a complete, unmodified version of the original (unencrypted) data.

As far as I understand, the untrusted device would have an encrypted copy of the database. That database only makes sense on the trusted device though since file/block hashes only match when data is unencrypted. Is that correct?

IMHO the question is how we can verify an untrusted device’s data integrity. If the untrusted device was able to verify integrity itself this would save us from having to download all the encrypted files to check their unencrypted version against (local) trusted hashes. I could imagine having an additional set of hashes per encryption key would be the most elegant solution. Would that have too much of an impact performance-wise?

There is a folder decryption tool in the pull request, which can locally decrypt an encrypted folder when given the original folder ID and password. This could gain an option to not actually write the data, just verify it.

Additional hashing would have some significant impact, essentially a multiplier on the original full hash time, and it would need to happen again any time a folder is shared with a new key. It would also mean we need to use deterministic encryption for the blocks. Another option would be some sort of protocol change so that the hashing can happen at send time. That might be somewhat tricky to shoehorn in.

1 Like

True, hashing does involve additional computational effort + disk space but IMHO this is negligible compared to the additional encryption overhead and of course the effort it takes to fully sync the data set over the network.

What we’d gain, on the other side, is the ability to verify data integrity on untrusted devices (as outlined above) which includes the ability to scrub data regularily to detect any disk issues. I’ll outline my use case (which I think might apply to many): I’m running a ST instance which was originally intended for my personal use only but over time, it’s become a backup repository for friends and family (yes, there are additional components in place to make this a proper backup solution). Those guys need a ‘fire and forget’ backup solution. They will never care to run any sort of tool to verify data integrity (because if they did, I wouldn’t have gotten involved in the first place…). On the other hand, I don’t want their (clear text) data or key. Therefore, I’d ideally be able to verify that my side of the house is fine (data integrity) based on their encrypted data.

Put differently, IMHO a copy without (easily) verifiable integrity isn’t worth much. E.g., restic’s approach (where one has to download and decrypt the whole data set to verify its integrity) isn’t really an option to me when it comes to WAN-attached storage. Since remote endpoints and average Internet bandwidth are ST’s daily business I think we should keep that concern in mind to make the untrusted concept suit even more use cases.

1 Like

If we can do it on the fly so we amortize the cost it would probably be fine. I guess worst case we could just add the block checksum as a trailer to the block itself, allowing checking that without doing decryption. This buys you about the same safety as using ZFS or similar for the storage.

1 Like

Is this related to the new encryption feature though? Syncthing doesn’t check data integrity now either (we do checks when we sync data, but as mentioned that happens with encryption too, just only on one side). What I am saying is if you want to periodically check data integrity, you’ll have to do that with another tool than Syncthing regardless of data is in plain or encrypted. And as you mention you use another tool for the backup on your own, “untrusted” backup server, I’d expect that tool to be able to do integrity checks (as backup tools usually can do that - or you could just use a checksumming FS (zfs, btrfs, …).

1 Like

This is a followup to the discussion started by this comment on the PR. I am posting here because I am pretty sure it’s just me that needs an answer, not the PR being changed :slight_smile:

Why do we encrypt the block hashes at all? Equal blocks can be detected with or without encryption, and the used hash (AES) should already ensure that you cannot infer anything about the data from the hashes, shouldn’t it? Or what else am I missing?

Not exactly sure what you mean - AES is an encryption algorithm, not a hashing algorithm.

If the hashes were unencrypted, the hash of plaintext data would be visible. So arbitrary data can be guessed and verified for correctness, using the hash - some sort of plaintext oracle. This is susceptible to rainbow tables, pre-hashed dictionaries and related things. Encrypting the hash at least makes it harder to verify “if data equals x”. For larger blocks this probably doesn’t matter much - the blocks are simply too large to make successfull guesswork -, but for shorter or easily predictable blocks this can make a significant difference.

1 Like

Yeah, that. We don’t want to leak the real hashes. If your question is more “why bother providing any hashes at all”, the encrypted hashes enable the usual block level diffing so only changed blocks need to be transferred.

It’s at least theoretically possible to do offline data verification on unencrypted folders as both the data and the hash database are there; we just don’t provide a tool for it. We could enable the same for encrypted folders. Encrypted folders also have the disadvantage that it’s not otherwise possible to just open a file and see if it seems healthy.

2 Likes

:see_no_evil:

SHA it should have been. All I was thinking about was that the hash cannot be used to guess at the data. That this doesn’t matter you kindly explained (and should have been obvious). I guess asking a trivial question (“trivial” to prevent any “there are no stupid questions” remarks) about a topic I just read up on is a signature move for me :smiley:

Nice of you to offer a way out of my blunder, but that part was perfectly clear :slight_smile:

For the small chance anyone with a similar level of understanding happens by this: The following answer was quite helpful to me regarding deterministic encryption/SIV: https://crypto.stackexchange.com/a/37097/75466

[quote=“imsodin, in github”]
…Then we can disable scanning (and FS watching)… [/quote]

Disable Scan (and FS watching) ??? How shall we (untrusted device) tell the trusted one(s) we need a file that is changed/damaged/corrupted here? Sorry if I play the bull in a china shop, I’m not a coder but I’m very interested in this thread.

A damaged file wont be picked up by fs watching anyway and as for “regular” scanning let me quote Jakob in the PR:

On the encrypted side, the folder type should be receive only and don’t do any scans… Changes to the stuff on the encrypted side will predictable break things.

Scanning is about local changes, and by definition an encrypted/untrusted node mustn’t do changes. If we were to implement an option to check data integrity (trusted or not), that would have to be separate from normal scans and thus could in principle also be done by an encrypted devices (as it stands the encrypted device doesn’t know it’s hashes though, see discussion following this earlier comment: Latest encrypted devices proposal, what & how).

1 Like

Hi Simon. Do you mean that except FS watching (which I can easily figure it decides on its own what to report), there is nothing in current ST that triggers a re-hash of yet-there-files and compares to the DB ? Even for rw & sendOnly Folders ? Or is it specific to receiveOnly ?

Okay, let’s wait and see.

Any idea if versioning methods will change in untrusted devices ? I ask because my backup setup (currently a single receiveOnly foldered off-site device that get synch’d from a Duplicati local encrypted backup of the folder) heavily relies on versioning in case of synchronous disaster (massive fire/ransomware) in all “currently trusted” devices, the Duplicati machine being one of them. You guess my interest: I could drop Duplicati and reclaim about half the storage size from the whole plaintext+encrypted.

EDIT : I see Jakob is replying… so I know ST isn’t a backup software :wink:

Changes are picked up by periodic scanning as well. But changes can’t happen on an untrusted device, and we don’t actively look for corruption. (How would we differentiate it from changes?) Encrypted folders are a special case, and corruption could theoretically be detected and handled. But it’s not something we do today in any other context and probably not something we will implement in phase one.

I understand this. So we get OutOfSync message, which is enough to revert the change on receiveOnly device

Pure curiosity @calmh: What was the motivation to switch to chacha instead of aes for the non-deterministic part?

Performance and slight worry of collisions on the shorter random nonces in AES-GCM. I’ve gotten some feedback on the overall design from a certain very knowledgeable cryptogopher that I’m acting on, and I’m writing up some better specs on how the whole thing works for another round of review. Hopefully this thing will have some sort of theoretical seal of approval on launch day, so we only need to worry about actual implementation bugs and not me accidentally holding the algorithm upside down.

6 Likes

There are some spec notes here: https://docs.syncthing.net/branch/untrusted/html/specs/untrusted.html

1 Like

I know a little bit about cryptography and have taken a look the spec. It does look decent, but I have some questions about the integrity between blocks and version rollbacks.

  • Is the order of the encrypted blocks authenticated? It shouldn’t be possible to reorder the encrypted blocks, even when the contents cannot be changed.

  • Is there anything that prevents an untrusted device from replacing a file with a previous version (or even deleting it) while claiming it is a new update? I think every file needs some kind of version number that increases and is authenticated, so the untrusted device cannot do a rollback. The trusted devices should also check that it does in fact increase.

  • How does syncing between untrusted devices work in case of a conflict? Trusted devices create a new file for the conflict, but untrusted devices cannot do that.

Thanks for looking and thinking!

The block order could be tampered with on the untrusted device to the extent that it can modify the fake file metadata and block order on disk. However, this wouldn’t matter to the trusted device, or rather it would be detected and rejected. A trusted device doesn’t look at the fake metadata, but decrypts and uses the attached encrypted metadata (or uses its own original copy of that metadata). When requesting a block that has been changed on disk on the untrusted side it would then get data which might decrypt properly but will fail the hash check after decryption, much like bad data on a normal trusted device.

Rolling back is prevented by the same metadata wrapping mechanism – it doesn’t matter how new or improved the untrusted device claims the file is, the original attached metadata is what’s used, and that metadata will contain the truth. The untrusted device will just look out of date.

This also affects the conflict handling. The untrusted device doesn’t see or understand the real history of a file given that it can’t decrypt the real metadata. It just passes on whatever is the most recent metadata from another trusted device, much like a dumb wire. Conflict resolution will then happen on some trusted device.