Exposed bits & security practices for users

Just started reading up on this great(!) project. I’m wondering about all the little details of how syncthing works online, and which of those myriad chunks of implementation are… naturally private by lack of any exposure, in the open but explicitly encrypted, and which would be acknowledged as not needing / having been designed to be protected.

I understand from the docs (“Internet wide discovery is performed with the assistance of a global server”) that there’s some sort of private, central node tracker. What is it? Is the tracker itself or my connection to it an attack vector? What info are my nodes sending it and hearing back from it?

I don’t trust my local network. How does discovery work, or more to the point—after a hit, what is subsequently sent? What can my local MitM hear?

Nodes themselves must all be administered as if completely private and completely trusted, yes?    • The web gui is locahost-only, but open.    • If I want a “can only read”-node, a “destination” node if you will—I could set my “source” nodes to read-only, but then they will not sync. (Can I overlap different clusters’ repositories?)    • If I have other services running on a node computer or I make a machine which is used by others in meatspace into a node, what could they read and how could those readable bits either be abused locally or after, taken elsewhere?

[some of these answers are obvious, but humor me.] Which of the following are wide open vs structurally inaccessible vs encrypted: file contents on the drive, file contents in transit, file listings and update chatter in transit, node addresses, local list of my other nodes, certs and sigs, private keys?

Sorry for the long (and naïve) post. Thanks.

1 Like

This is a good question. Lets see if i can answer it coherently.

Files being synced, on disk

The main use case for syncthing is keeping a bunch of files (documents, photos, whatever) in sync between a bunch of machines, with the files being potentially modified on either of them. The files are synchronized as-is, with permission bits, but not encrypted or otherwise protected on disk. Complex ACL:s are not synchronized; this is tricky in the general case, especially across platforms.

There is a proposal for having encrypted files on disk, for nodes that are just storage and not “front end.” It would be a nice feature but requires some engineering…

I’d like to support non-source nodes as well, i.e. nodes that are read only in the “other direction” from what read only currently means. They could act as a backup destination but not be allowed to change any files.

Syncthing does take care not to damage your files. The synchronization mechanism works by calculating the SHA-256 hash of the file, in 128 KiB blocks. Every node in the cluster knows the set of block hashes that make up the latest version of the file. If the file on disk doesn’t match that, a copy is made using the blocks from the old version that are still valid (if any), missing blocks are requested from peer nodes and written to the copy, and finally all the blocks are hashed again. If everything matches the expected hashes the original file is replaced with the copy. If the hashes don’t match up the (bad) copy is removed and we try again in a short while. The most common reason for a synced file not matching is if the source file was changed during the sync operation, but any other kind of error or corruption will get caught as well. This means as a user, when you look at some file, it is either entirely an old version or entirely the latest version, never something in between.

Configuration directory

The configuration lives in ~/.syncthing (or the appropriate equivalent on Windows). It contains the config.xml which defines the cluster members etc and which isn’t very sensitive other than exposing which directories you synchronize. It also contains the cert.pem (public key) and key.pem (private key) files. The private key file is sensitive and should be protected - it is what identifies the node as a certain node ID. Syncthing makes sure to set the relevant permission bits to keep it secure, but if your machine is compromised the two pem files would let an attacker impersonate your node ID and join a cluster on your behalf. Likewise, if an attacker can write to the config they can make your computer connect to and synchronize with a node controlled by the attacker.

Node IDs

The node IDs are longish strings of letters and numbers (like ME6QVQK2B4BFYWIANFJCSN76Q2GMH3NZISD6LAYME6CSDSCPE47) that uniquely identifies a node. They are not sensitive by themselves, as in knowing that one of my nodes has the ID above won’t help you impersonate that node. Technically, they are the SHA-256 hash of the node’s certificate. Identifying a node like this is slightly more secure than a regular HTTPS certificate, which works by letting some authority sign the hash of the site’s public key, usually using a less secure (older) hash function, thus certifying that the metadata in the public key (site name etc) are correct. Syncthing simply bypasses the step of caring about the metadata and locks down exactly which public key it expects to see for a given node.

When two nodes connect, they send their public keys as part of the TLS handshake. Each calculates the hash of that public key and compares it to the expected node ID. Thus a node doesn’t “claim” to be a given node ID, they simply are a node with a given ID based on the keys on disk.

This may all sound somewhat complicated but it’s actually kind of the most basic way to use TLS to secure a connection and identify the counterpart. I’ve taken care to not innovate in this area (crypto is hard to get right) and keep to standards.

Peer connections

After the initial handshake as above, we have an encrypted and authenticated channel between two nodes. The protocol spec defines which cipher suites are acceptable, basically restricting implementations to the latest and greatest. In practice this means we are assured of strong encryption and perfect forward secrecy (i.e. if someone records the traffic they can’t decrypt it later, even if they get access to the keys).

Both nodes start by sending a list of repositories they want to exchange with the peer node and the list of nodes that are configured for each such repository. The setup must match or the connection is closed. This can be somewhat annoying when settings things up but ensuring identical cluster configuration will prevent issues in the future when there are read only modes that need to be enforced, etc.

Over this connection both nodes send “indexes” (lists of files and directories, their metadata such as modification time and permission bits, and lists of block hashes for the files) and request blocks from one another as needed. This is all safe and secure behind the crypto etc described.

Admin interface

The admin HTTP interface (GUI) defaults to localhost only, and no password. This matches the expected situation of a single user machine security wise. If needed you can set a password (it then uses HTTP Basic Authentication, HTTPS is planned) or disable the GUI entirely.

Discovery

The protocol used for local and global discovery is pretty simple. Unless disabled, each node periodically sends a packet to the local network that contains a magic number (identifying it as a syncthing discovery message), the source node ID and the port number on which it listens. If global discovery is enabled, the same is sent to a well known address on the internet (announce.syncthing.net, although this can be changed in the config if you want global discovery but want to run the service yourself).

When a node needs to connect to another and has no statically configured address, a table that contains nodes we’ve seen discovery packets for is consulted. If we have an entry, we try to connect to that address. If we don’t, we send a packet to the global announce server asking for the address of the destination node ID. If we get a reply, we try connecting to that address.

There is a vulnerability here - the discovery packets (local and global) aren’t authenticated so the node ID in them is simply taken at face value. You could forge such packets to potentially get a connection attempt to your computer instead of the genuine node. The connection will be immediately dropped since the TLS keys won’t match the expected node ID. You could also forge packets with an incorrect address so that connection attempts go into a black hole instead of the correct node. In either case the only damage is a node that doesn’t connect to the cluster properly, and it can be fixed by using a statically configured address (or taking a cluebat to the attacker on your local network).

You can also do passive surveillance to see which computers on the local network run syncthing and what their node ID:s are. As I see it, this isn’t sensitive information. If you think it is, you can disable local discovery.

The global announce server knows a bunch of mappings between node ID and IP number. It also gets a rough idea of the number of syncthing users based on how many nodes announce themselves to it. Again, this isn’t really sensitive information, an outside attacker can’t request a list of nodes anyway (only ask for a specific node ID). Still, if you prefer to not leak this information you can always disable global announce or set up your own global announce server (the code is in the repository).

5 Likes

It’s good to see that care was taken in laying the foundation of how syncthing works, and also that no crypto things were unnecessarily reengineered. I think syncthing meets the needs of that main use case very well. File encryption is a side-topic, really, and for now could obviously still be done before letting syncthing get at the data. I’m also happy to read about how changes to sub-file blocks are handled, because while I’d think most synced files will be small-ish in most cases, in my perhaps-oddball situation even the huge files I’m syncing generally have changes made in-place without re-writing or shifting bits back and forth surrounding changes.

p.s. WOW—quite a response. Thanks for that. I am left wondering if your prose writing on technical subjects always manages to read like documentation! Wonderful clarity and intelligibility. I know that I [and perhaps “we”] appreciate it.

1 Like