Are syncthing node IDs vulnerable to a length extension attack?

You might want to consider whether the fingerprint is susceptible to a length-extension attack. http://en.wikipedia.org/wiki/Length_extension_attack

A fairly simple soltion is to use SHA-256-HMAC rather than just the hash. http://en.wikipedia.org/wiki/Hash-based_message_authentication_code

1 Like

I doubt it. Or if so, only to the same extent that any X.509 certificate is.

What do you mean by “doubt it”? This is not clear from your answer to a suggestion of possible vulnerability considerations.

Can you give a hint why the referenced two articles do not make sense in this context, or why using suggested solution would solve a non-existent problem?

I mean that I doubt a length extension attack applies to node ID:s. It seems to be related to using hashes of secret+cleartext as an authentication method and that is not what happens here. Comparing to how “normal” SSL certificates work:

  • A public key is generated and has some metadata like site name
  • A CA calculates a hash of the public key (usually using SHA-1) and signs it using RSA. This is the certificate.

Syncthing does the same, but uses a stronger hash algorithm and uses the hash directly for verification instead of using RSA encryption on top. Hence the “only to the same extent as any other X.509 certificate”.

Ah, thank you very much, now I got the idea. Yes, convinced about your claim.

cheers - Chris

Direct use of the hash is precisely the concern. The reason the MD5 sum is passed through RSA is to verify the authenticated user is the one signing it. Because producing this signature requires the private key (but verifying it only requires the public key), we can be reasonably confident that the signer is who he claims to be. If we only require that the value be mapped to a known public hash, then it is possible that vulnerabilities in the hash function itself be exploited to create a collision (e.g., a length extension attack in MD5 or SHA1). Because of this, the hash alone cannot be trused to verify identity.

This will be a concern when there is any doubt at all about SHA-256 being less than optimally collision resistant, yes?

Also, if I can create a hash collision I can reuse the RSA signature from the CA, so the risk is the same, again?

Yes, those particular concerns are about the vulnerability of SHA-256 itself. However, public key authentication would grant the protocol additional avenues of verification. For example, authentication could work by requiring the trusted node to store public keys of other nodes and to issue counter values or timestamps at authentication time to the nodes trying to authenticate. The node in question could then sign this unique message, verifying that it has the corresponding private key. This would disallow reuse of old authentication values as the node doing the verification would never reissue the same message. This is an important protection in keeping out untrusted nodes, but is something that the current hashing mechanism cannot handle (as far as I understand).

Sorry, I don’t understand. We already do a key exchange authenticated by the public/private key pair as part of the TLS negotiation. What kind of attack do you envision your scheme protecting against, exactly?

Note that for all practical purposes we are storing the remote node’s public key, in the form of the node ID.

Ah, yes, I just realized that. In this case the attack would consist of forging the public key by creating a node ID collision. Why not store the effective public key on the node granting access? This would both prevent such an attack and eliminate the (albeit highly unlikely) chance of a public key hash collision.

But if you can do that (create a public key with a given hash on demand) you can forge any TLS certificate. The world’s economic infrastructure collapses, privacy is a thing of the past and syncthing will be the least of our worries. It’s simply not an attack worth trying protect against.

This is not to sat that syncthing is unattackable, just that it’s much much more likely that there’s a bug in the implementation that would allow an attacker to bypass the node ID check entirely, or modify the config to allow a rogue node ID, than that there’s a problem in the theoretical foundation.

It’s probably infeasible in practice, but this approach is certainly not equivalent to breaking TLS itself. The attack might effectively be a collision attack but might not involve the a pubkey -> privkey inversion at any stage. Rather, if there’s some sort of bias in the mapping from privkey -> hash(pubkey), then the attack can be leveraged directly against the effective private key “preimage.”

Of course, with a fixed key size this is not susceptible to a length-extension attack. It’s just important to realize that by accepting the hash itself as an ID, rather than the public key directly, we’re shunting some of the securities granted by TLS itself and instead relying on a hopefully-robust hash. If this risk is balanced out by the convenience of having to enter a shorter key, then perhaps it’s worth it.

Again, a TLS certificate is the hash of a public key plus some metadata, no? In a certificate you even get to pick the metadata yourself before computing the hash…

Assuming, for the sake of the argument if nothing else, that I’m wrong and I’ve missed something about how this all ties together…

How would you feel about using a short node ID (say 64 bits, like a long GPG key ID) and saving the entire peer certificate on initial connect? This ought to be a gain in user friendliness (shorter id) and long term security (no chance of spoofing the public key) at the price of a one time chance to take over a connection? The latter seems to be a similar risk taken by GPG and the ssh “this is a new host, here’s the fingerprint” question.

Edit: the sufficiently paranoid could of course arrange to copy the certificate over a secure channel prior to the first connect.

If I may, I lack as robust a background in crypto as you gentlemen, but I would be much more comfortable with that GPG/SSH-keys model.

For that matter, I’d love to be able to customize the key pair generation, either within syncthing or using a tool like ssh-keys and importing the keys. It would be nice for users to be able to tailor things like the bitlength to their threat perception. People have used and trusted rsync over SSH for years. syncthing offers many convient features above and beyond that, but using that time-honored authentication model seems like a sound idea to me.

Yeah, that’s as good as any reason to prefer a given model.

You can tailor the keys already today, just not using syncthing. The cert.pem and key.pem are “regular” TLS certificate and key files, just like you would use for an HTTP server (self signed, although syncthing doesn’t care about the actual signature). Today it’s a 3072 bit RSA key, but you can generate whatever you prefer using for example openssl and just drop in the files.

1 Like

I wrote up a documentation article explaining node IDs as they are today.

http://forum.syncthing.net/t/how-node-ids-work/365

I’d prefer to keep this discussion where it is, but that forms background so we at least for sure all talk about the same thing.

Yes, presenting the full certificate upon first connection would make me feel much more comfortable as long as there were a way to deny a bad certificate at the same time.

How about this to prevent collisions and more closely mimic the x509 authentication scheme: assign each node a unique (string) ID (maybe a user-defined string or a generated UUID); set this unique ID as the CN in the generated certificate; then, add the (SHA-256) hash of the entire certificate to a mapping from UUID to certificate hash. This way, a collision would require both the UUID and hash to match rather than just an overall hash. Because we can guarantee that these new node IDs are unique (e.g., the user can manually override node ID if there exists a matching one in his cluster), we don’t have an issue of by-chance collisions.

I’ve already started looking into implementing this, but it looks like it will require quite a few changes (e.g., configuration file format and fields, node ID definition, and everywhere these are used). I want to hear what other people think about this before diving in.

Note that we could merge the node ID with the node name (as seen in the UI when adding a new node), but this would give the user less flexibility: node names would have to be unique at the time of certificate generation and would need to match across all machines. For this reason, I think it would be preferable to have a separate node ID for each node, but to allow this node ID to be set manually (e.g., such that it matches a human-readable node name).