Discovery resiliency / DHT

Nutomic · February 29, 2016, 10:12pm

Do you have something in mind for this already? I’d still like to see a proper DHT, but I understand that there might be a problem of trust and privacy. Instead, how about having each project member host a discovery/relaypool server, and decentralize it that way?

Eddy2909 · March 1, 2016, 9:23am

I think we could try to get a hosting sponsor for syncthing… I would try to acquire https://www.profitbricks.de/

They provide high scalable and dynamic infrastructure

maybe they could provide us with a official relay and the disco centralstation

Some months ago I saw a p2p-cdn provided video, which was streamed from the server (20%) and from other peers (80%) … could this be the base for a distributed infrastructure?

Links:

https://www.peer5.com/

https://peercdn.com/

canton7 · March 1, 2016, 9:52am

The more relays the better, but we need to think carefully about the discovery servers. A malicious or mis-managed discovery server could do a lot of damage.

No. What Syncthing does is very different to a CDN.

A CDN hosts static content which changes extremely rarely (if at all), which is available to everyone. Syncthing has no static content.

Eddy2909 · March 1, 2016, 10:09am

I thought of this p2p-cdn technique for discos - not for st. What if the disco infos would be hosted and provided by the peers itself? (even if it changes ?)

canton7 · March 1, 2016, 10:12am

This is heading towards the DHT approach being discussed, and mentioned by @Nutomic above.

This is different to a CDN in that the content changes very regularly (as devices come online and go offline) - we won’t be able to use an existing CDN infrastructure, and indeed we shouldn’t as we’re not delivering static content.

rumpelsepp · March 1, 2016, 10:14am

How does the tox messenger solve this problem? Does anyone know?

Nutomic · March 1, 2016, 4:22pm

The protocol is here, it’s based on the Bittorrent DHT protcol. From a quick glance, itduplicates DHT entries for better reliability, but doesn’t do anything for privacy (so an attacker could request the IP addresses of all nodes on the network).

canton7 · March 1, 2016, 4:38pm

I wonder if we could do something funky whereby the IP address associated with a device ID is encrypted using secret sauce, which is somehow shared between devices using a side channel.

For example, Alice wants to connect to Bob. Alice gives Bob her Device ID and secret sauce. Bob can use the Device ID to query the DHT, and the secret sauce to decrypt the IP address he gets back from the DHT. Uses this to connect to Alice. Bob’s Device automatically gives Alice’s Device its secret sauce on first connection.

A problem with this is that the secret sauce, once shared, cannot be unshared. Sharing devices also becomes harder, and migration from the existing system could be a PITA.

An alternative might be: Device A connects to Devices B and C. Device A encrypts its IP address with Device B’s Device ID as the key, and broadcasts this to the DHT. It does the same with Device C’s Device ID. When Device B wants to connect, it retrieves all records for Device A’s Device ID from the DHT, and decrypts each using its own Device ID as the key until it successfully manages to decrypt one.

However if an attacker can figure out which Device IDs connect to which other Device IDs, the security falls apart.

You then get into the realms of Device A having different secret sauces for devices B and C: effectively a token which represents the relationship between Devices A and B.

devanubis · March 2, 2016, 11:11pm

Am I right in thinking that local discovery is announce-only between nodes which already know one another, and that nodes can’t currently query their peers to find out if they already know a new node?

It wouldn’t solve initial introduction between two disparate nodes with no peers in common, but existing peers are more trustworthy than third-parties either in a DTH or a network of discovery servers.

AudriusButkevicius · March 2, 2016, 11:48pm

local discovery is multicast within the local network, peers currently do not share other peers they are aware of, and probably that wouldn’t help anyway, as you have a bootstrapping problem.

devanubis · March 3, 2016, 1:02am

Yes you still have the issue of bootstrapping into a network of peers, but that’s what the discovery network is for. I’m just saying that it would be preferable to ask any existing peers whether they already know the new device first before asking the discovery network (be that the current global servers, a DHT or whatever).

The current way of bootstrapping an initial connection is very convenient, only the initiating device has to know the other device ID in order to add them, request their IP from the discovery network and then send them a request.

As far as I understand it, nodes only use the global discovery:

Two nodes want to connect for the first time (bootstrapping)
If two nodes both change IP at around the same time and can no longer announce to each other (reconnection)

The problem with a DHT or similar list of device IDs and IPs is that anyone can look up a device’s IP, but also that a rouge discovery server or DHT-node could listen in on every request for device IDs, logging who’s adding who.

If a node could ask its more trusted peers if they already know a new node before going out to the public discovery network then that’s a small privacy win. If we have no peers, or the node is unknown to them, then falling back on the discovery network is no worse than now.

I’ve also been thinking about canton7’s suggestions.

This still allows anyone to lookup all entries for a device. Even if they’re encrypted, it reveals that the device’s exists, and has probably interacted with N other devices. And without a shared “secret sauce” between Alice and Bob, a snooper who knew both their device IDs would be able to tell whether they had connected.

To avoid revealing the number of entries a device has (and even the device’s existence) each entry in the discovery network could be a unique pair between two devices, optionally salted with a single-use pass phrase (“secret sauce”) to avoid whether Alice and Bob have ever connected.

If Alice and Bob want to connect:

Alice and Bob both exchange device IDs (and optionally a shared pass phrase)
Alice generates a secret by joining (in order):
1. Her device ID
2. Bob’s device ID
3. (Optionally) the shared pass phrase if they want extra privacy
Alice encrypts her IP address, using her secret
Alice posts her encrypted IP to the discovery network, using a hash of her secret as the ID
Bob does the same procedure as Alice
Note that Bob’s secret will be different to Alice’s secret as the order of their device IDs is different, and so the entries have different hashes. Both Alice and Bob know how to make each other’s secrets.
Alice and Bob then generate the other’s secret, hash it, and query the discovery network for that entry. They can then decrypt the IPs and connect.

Without the pass phrase, this still allows a snooper who knows both Alice and Bob’s device IDs to find out if they have connected, but at least snoopers can’t find out whether a device exists or how many other devices it has tried to connect to.

I suppose that the pass phrase could be remembered by both nodes too, so that if they have ever both change IP and can’t announce to one another, and have no peers in common (or all peers change IP and all loose contact) then they can fall back on publishing their new IPs to the discovery network again, each only identifiable by the other.

calmh · March 3, 2016, 1:53am

This is simple enough to implement and makes some sense. On one hand the value is a bit limited as, in contrast to for example BitTorrent, you’ll only ever be connected to handful of devices and they are in turn connected to probably the same devices… But on the other hand, those are probably the only devices you are interested in.

The initial protocol handshake already contains the list of cluster devices and their addresses, if configured statically. We could add a message to notify about connection events, letting other devices know who we are connected to and how. This could be used to improve knowledge about the cluster connectedness in general, for the block pulling mechanism to use and for the user to see in the GUI.

AudriusButkevicius · March 3, 2016, 8:44am

Thats not true. We don’t save any IPs, which means we use discovery every time we connect, not first time we connect.

The bootstrapping problem I mentioned is todo with finding the IP of the first peer, which has to happen via discovery. But the same problem applies to every peer, as every peer will have to use discovery atleast once (or probably more often, due to some peers being offline).

devanubis · March 3, 2016, 9:32am

Oops, I think I’ve misunderstood both local discovery and re-connecting between existing peers.

I thought that in addition to the periodic multicast announcement, each node would occasionally send a handshake to each of it’s peers at their last known address to check of they’re still there. If that’s not the case, and nodes only know one another’s IPs whilst actively connected, then I can see why you feel peer-exchange would be less useful.

devanubis · March 3, 2016, 9:40am

Sorry for the double post but the Discourse responsive/mobile setup want letting me add a second quote…

I’m not sure if announcing all known peers is the best option either. I might have one shared Folder with Jack and one with Jill but I don’t want Jack to know about Jill unless Jack asks me for Jill’s device ID (indicating that Jack at least already knows Jill exists). I’m not wildly concerned about it, but I’d bet that the Ind.ie people would be (it would effectively leak known contacts to all other contacts)

AudriusButkevicius · March 3, 2016, 9:48am

Multicast is only meaningful within the local network. If you are within the local network, you don’t need global discovery, nor you need peer exchange (as everyone is in the same lan so saying “hey bob is over here” is enough). At the point you need to leave the local network and connect to someone externally, this stops working to some extent. You need to either hardcode the IP of some external peer on all of your devices, and hope that he is always to perform the peer-exchange (which is not implemented), or use global discovery.

If you use global discovery, you will probably try looking up a number of devices before you’ll find one that is online. Also, every device will have to do that, if they are not on the local network with another device.

Check the docs, I feel most of this is explained, and I am just repeating myself.

calmh · March 7, 2016, 9:16pm

We already do this, as the connection handshake contains the full list of cluster devices (per folder; so only for folder that are shared with the device in question). (I did mention this to Aral by the way, so they’re aware. But that project seems dead in the water anyway so I don’t think it matters.)

acolomb · March 20, 2016, 12:02pm

Just a thought regarding the DHT approach and its possible privacy pitfalls. The GNUnet project has come up with an alternative to DNS which they call the GNU Name System. It provides strong privacy for queries to the DHT, as well as replies. Some links on the topic:

Maybe this could provide some inspiration for Syncthing’s discovery mechanism?

Manu · March 20, 2016, 12:17pm

What’s bad with a kademlia DHT and this approach:

key : SHA512(syncthing public key)
value : AES512(ip:port address) with the password syncthing public key

We can so encrypt the value with the syncthing private key like this:

key : SHA512(syncthing public key)
value : ENCRYPT(ip:port address) with the syncthing private key

What’s the privacy concern of this approach ?

AudriusButkevicius · March 20, 2016, 2:21pm

You can still derive who’s talking to who, by seeing which keys are being accessed by who, same way you can track who created the key.