I’ve been pestering poor Jacob with some privacy issues regarding discosrv and the stats server some time ago:
There I explained in detail what can be done with different kinds of information and what can’t be done.
Apparently you (Eddy2909) don’t really understand the problems that distribution and encryption bring with them or how the current setup works. That’s why I’d like to tell you that encrypting or otherwise protecting IPs and IDs on the server just isn’t worth it. Those who can hack one with this information don’t have to get it from Syncthing.
The only thing which can be done with the ID-IP mapping are targeted attacks on the device/network at the IP.
But the only entities interested in such information (state actors like NSA&GCHQ and paid/motivated hackers) don’t need syncthing to get this information.
State actors simply subvert the network and detect ones unencrypted http sessions via identifiers like usernames, email-addresses and so on. They then inject their malware which the browser then executes and one got owned. This also works for https as long as we use the broken X509 certificate system where each of the hundreds of certificate authorities (among them Chinese and Iranian entities as well as companies which can be subverted legally in the US) can issue certificates for any domain.
Hackers will instead use ones e-mail address, facebook, jabber or other instant messaging IDs to deliver their malware.
From 0.12 on even passive sniffing from a favorable network perspective isn’t enough to get this data - the discovery server has to be actively hacked (because we don’t use shitty X509).
BTW: the problem of the very small namespace applies to phone numbers and IPv4 addresses, IPv6 ones are better.
And none of the solutions scale incredibly well. It would work better for Syncthing than TextSecure because our users have less contacts on average, but the work is still not worth if for the minimal benefit. Phone numbers are way more sensitive then IPs.
If we ever use a kind of DHT to protect against attacks on a centralized infrastructure, we should use an encrypted DHT, but that’s about it.
Okay here’s a pretty wild idea how to keep the device ID secret from the discovery server.
When a device wants to announce to the discovery server, it creates a PRNG seeded with its device ID. It then takes the nth output of the PRNG, where n is the number of days from a given start date (eg Unix epoch). This way, every device gets a new “secondary address” every day, that is only know by those who know the actual device address and the discovery server cannot draw any correlations for more than one day (or other time period).
The main problem is that an attacker could just search input space (the set of all device IDs) for one that gives the corresponding secondary address. But this should be solvable by by doing something like value = slow_hash(previous).
This is completely out of the blue, so there might be other things I missed (please tell). Also, I haven’t done any of the math to know how feasible this would with current hardware etc.
Can I ask you guys to take a step back and explain the problem you’re solving / the vulnerability you’re protecting against / what you’re afraid of? This is not clear to me, so I don’t understand the proposed solutions.
The problem I perceived is that the announce server necessarily knows which devices are connected (which device requests the IP address of another device). That would be especially problematic if we move to a DHT, where everyone hosting an announce server could gather that info.
Why is it a problem that the disco server knows that devices A and B communicate? Isn’t this sort of inherent to the system, much like a dns server operator (Google, OpenDNS, your ISP) knows what addresses you look up, the BitTorrent DHT knows what torrents you get and from whom, etc?
I’m being intentionally slightly obtuse here because I want the problem, whatever it is, to be very clearly defined.
Your suggestion above sounds technically sound to me (although I’m not a crypto expert of any kind), but I also don’t see how it affects the above problem in a meaningful way. You get a new device id each day, but within a given day the same mapping out of who talks to whom is possible, and it’s fairly trivial to map yesterday’s IDs to today’s by assuming most IPs haven’t changed.
And of course, the tin foil hat people (and I use that term with love) can always set up their own discovery server, or not use global discovery if they prefer DNS or something else.
I don’t see any concern here at all, Syncthing should be able to work without any discosrv if you have static IPs and opened port, the only usage is for those who runs dynamic IPs, and if ones concern about the security issue, they can run their own discosrv, so DHT should be capable and available but not necessary and not recommend unless syncthing don’t have a public discosrv or under heavy loading.
So in most cases there aren’t any concern at all. Why talk about centralize / de-centralize when custom discosrv is available and manual configurable?
and for the love of god, you can do dyndns on the IP by a tiny router, so technically a discosrv can also be ignored.
centralized disco server is down because of ddos, hack, soft- or hardware problems
lot of syncthing devices would be blind all over the world for a unknown time (untill you get it fixed)
centralized disco server could get hacked or infiltrated (by governments)
nobody should see my ip in disco | neither governments nor disco admin | admins surely will see my ip addresses connecting to disco but the connection between them is missing:
nobody should see my “network” and who is talking to whom. This is the same discussion we had in Germany with the “Vorratsdatenspeicherung” (the government wanted to collect ALL metadata) and the discussion with Facebook (knowing my private network and giving out these info)
what about performance in future?
what about costs in future?
what about the location of a centralized disco | “secure by law in a safe harbour”?
I wish syncthing to be as safe as possible and should be usable anonymously (as anonymously it could get) otherwise we could use email-addresses instead of id’s
The only “problem” I see is that a discovery server can see which device is looking for which other devices. This means that, given one device id, it can create a map of all the ips of all of the devices that that device talks to. This could potentially leak information that might be assumed to be private: given the device id of my home device, a discovery server could work out that another device in that network has an ip belonging to my employer, and so figure out who I work for.
I say “problem” because it’s likely not an issue in the vast majority of cases, and is easily worked around (by hosting a custom discovery server, or not using a discovery server at all…)
I’m not particularly interested in this threat model. Assume that they have your internet connection, mobile and work network tapped since forever. Assume that they can look into TLS, and assume that they’ve installed a backdoor on your computer. It’s not worth defending against in Syncthing, in my opinion.
Performance and costs are something to worry about when we get there. The current (and upcoming) protocol is easy to load balance. Not sure what you mean by the safe harbour thing, sorry.
This one I can get behind. A prerequisite is that I know that a certain device ID belongs to “you”, but from there, yes. But I’m not sure what to do about it… You have the same issue with Google, OpenDNS, your ISP and so on as they will see DNS requests for vpn.youremployer.com and so on from your home address.
Also, anonymity was never one of the stronger goals of Syncthing, to be honest.
If device IDs can safely be made public (as stayed earlier in this thread) then starting is easy. Otherwise you can start with an IP e.g. the IP of my website which you can get from my github page. Admittedly not many people are in that position…
I can’t think of a way to avoid this though, other than giving everyone the entire IP:ID map (thus hiding who’s asking about who) but that’s arguably worse…
I think the message is to only use discovery servers you trust: a page full of random people’s discovery servers would therefore be a potentially bad idea.
How is this a thing for Syncthing? The principle behind Syncthing is that it’s absolutely impossible for a third party, discoserver operator or not, to have any insight in the contents you are syncing. (As opposed to, for example, BitTorrent.) If someone is chasing you for a copyright infraction, that’s because you posted some material somewhere you shouldn’t, or you already have uninvited guests on your devices… Access to the discovery server brings nothing new to the table in that case?
Yeah. But that is safely from the perspective that it gives you no access to my data. The device ID does act as a device locator, so that you can use it to locate me (that is, get my IP) is expected and necessary. I think this is a property of peer-to-peer we can’t easily get around?
Well, with the relay system in place you can get by with only the relay and the discovery server knowing your IP. That might be a an advantage for some.
I have a slightly different spin on this issue because I have 2 distinct use cases for Syncthing:
Used for business only. All data encrypted before insertion into shared folders. 7 Devices, some are mobile (no static IP). We use a private discosrv, and it’s unlikely we would share this server as proposed by the OP.
Used to share photos and media with family and friends, and their friends, etc. No one in the network is worried about privacy or data leakage. To ensure easy connectivity, everyone uses the public discovery server.
Although this arrangement works, I have 2 slight concerns with all of this:
(1) That the public discosrv represents a potential single point of failure that would destroy my Public Mode if it was not available. Would be nice to have more public servers out there, perhaps geo-located. What’s missing is a reliable method to cover the incremental operating costs.
(2) It’s somewhat inconvenient to switch modes from Public to Private. I have to go into the WebGUI, delete the discosrv IP field and repopulate it. And when I do this, I lose my sync connection to one of the 2 modes.
Maybe 1 day we could have the ability to use more than 1 discovery server simulaneously - segmented by device? My connection to Alice and Bob uses the public discosrv, while my connection to Carol uses a private server - all in real-time.
You can already use N servers, and there is no issue of going to the public disco asking about a private device ID, as that has no meaning, and you cannot work anything out from that if the private device is only available on a private disco.
Well, operationally the current discovery load is low enough that it’s no big deal to just spin up one or two more out there. If that would make people happy, I can arrange for that. That provides redundancy in case of failures, though no (well, less, even) protection against the various info leakage concerns…