Censor device ids in the logs

The full device ID should never appear in the log, at least by default. Not tech savvy users would post those logs as they are and could easily become a target of anonymous surveillance. Here’s how:

  1. A user has a phone with Syncthing on it. He syncs with his home PC and the work PC. It’s a pretty usual setup to simplify sending the files from and to the device.
  2. One day this user has some issues with Syncthing so he posts the log here with all device IDs fully visible. The issues are hopefully resolved and this is forgotten.
  3. But some malicious “hacker” (in quotes because it doesn’t really require any hacking skills) copied those IDs and started querying the discovery server for IPs that correspond to these IDs.
  4. Not only the IPs (and hence the country of origin or even city) is disclosed to anyone on the Internet. Soon the hacker notices that IP of one device changes twice a day. Probably it belongs to a mobile device that goes from one network to another. And it’s expected if the victim uses Wi-Fi at home and at work. Furthermore, if the work IP belongs to a relatively big corporation, it becomes trivial to find out where the victim works. And also when he leaves to work and gets back. I.e. when the house is empty. You know what I mean.

The worst thing is that each time you post your logs unredacted you invite the entire world to track you, anonymously, for free. Given that Syncthing positions itself as a secure and private alternative to other such services it’s completely unacceptable. Search this forum for lines like “Established secure connection to” and grab those IDs (yes, some people already redact them but what about those who don’t? And if the log is long enough it’s easy to miss some of them), then make requests to https://discovery-v4.syncthing.net/v2/?device=XXXXXXX-XXXXXXX-… You’ll get all the victim external IPs. If you query them once per hour it will cause no suspicion whatsoever and it’s often enough to see if the addresses change.

My proposition:

  1. Only show the first group of the device ID in the log, i.e. the first 7 characters. It’s absolutely enough to distinguish between the user devices to locate the issue. The full ID is almost never needed as it’s pseudorandom and collisions should be extremely rare (36^-7 = 1/78364164096 = 0.00000000001276).
  2. For that extremely rare case when it’s really needed provide a command line switch with a long and descriptive name to enable full ID recording. Something like --i-dont-care-about-privacy-and-surveillance-please-dump-full-ids-in-the-log

I already posted about this issue before but the IDs are still there. Even the official FAQ tells users they shouldn’t keep the IDs secret and it’s mindblowing to me as it’s a huge privacy flaw. Please remove them from the log. Maybe IPs as well, leave only the first and last octet, though IPs often belong to a LAN. Still, if the victim connects to his home device from work that would most likely happen via the Internet and it will reveal the external IP of that device. Anyway, this won’t enable a 24/7 surveillance by itself unlike the device IDs which are much more important to hide from the general public.

2 Likes

I think a short device ID would be fine. However, I think the value of real IPs in troubleshooting is greater than the risk.

Most often you don’t need the whole IP to debug the issue, it should be enough to see if it matches some other IP or not and also if it’s a WAN or LAN IP. There could be many ways to solve this, I can only propose some:

  1. As I already said, only show the first and last octets + port. Should be enough to see if they match. Allow dumping full IPs via a command line option (not a config or GUI checkbox because it’s easy to forget to turn those off after resolving the issue).
  2. A short hash (first 6-8 characters), always with a secret salt string because these days it’s trivial to bruteforce ~4 billion IPs to reveal the actual address. The problem is that the salt should be consistent across the user’s devices or else it would be impossible to match the addresses. Maybe as a command line option as well? Like --mangle-ips “secret-string” which the user can supply to all affected instances.
  3. Only hide the WAN IPs and leave the LAN ones in full form.
1 Like

Keep in mind the primary consumer of the log is the user themselves. Is there lots of prior art of network tools out there obfuscating the IPs they work with? It seems to me that it would be more annoying than helpful in most cases.

Okay, I don’t insist, it’s just thoughts. A single instance of IP is not much of value for an attacker I guess. But plain text IDs shouldn’t be visible and also that entry in FAQ should better explain the risks of leaking the IDs.

In the modern world side channel attacks become more and more common, we shouldn’t neglect them. Your crypto might be flawless but if the system as a whole allows to track the user in realtime it’s awful. Not just track if the IP changes but also if the device is online or offline, for example it can reveal the user’s real timezone (to some extent) if he uses a VPN that terminates in another country, by matching the uptime periods with the day/night cycle so the IP the discosrv returns isn’t his real one. And so on.

1 Like

Yeah, I agree, the value of logging the full device ID is mostly minimal anyway. There are a couple of exceptions, for example connections from unknown devices, so some thought needs to be put into it. But in principle I see no problem with logging just the first character group almost always.

3 Likes

A long time ago I had a halve finished branch that introduces “human readable” device identification in logs, like for folders where its “label (id)”: “name (shortid)”. It’s definitely way out of date, but maybe less work to rebase than to start from scratch - I’ll have a look eventually. If someone wants to start work soon, I can also push the old branch somewhere just in case it’s useful.

1 Like