Yes, but syncthing doesn’t know that. The whole process of discovery is built around the assumption that some transparent middlewares are present, because that’s correct for most users. That doesn’t mean that discovery requires a NAT, but that’s why it’s designed the way it is.
The option is called “NAT traversal” I believe and just configures whether to attempt NAT-PMP/UPnP port mappings. It has no influence on how discovery works.
That opens up an interesting question: If you can do direct (LAN?) connections, why not:
use local discovery
use static addresses
Self-hosted “local” global discovery seems convoluted to begin with in such a setup.
Because syncthing is not a magic application that somehow knows your network topology by heart. The presence of firewalls, multi-layer NATs and such is often transparent and sometimes even undetectable. Syncthing caters for the vast majority of users, which isn’t your setup. And while syncthing is highly configurable, if you want it to work for a specialized use case, then you must also configure it for that. You can always use static addresses if discovery doesn’t fit your needs.
I agree that the bad ports are undesired, but I feel that you’re barking up the wrong tree here. We still don’t know why it doesn’t work the way it should, but you also haven’t shared any logs that could shed light on the underlying issue.
Actually, this is interesting. “Dialing direct” means that syncthing didn’t even try to reuse a listening port, because it didn’t find anything listening on the outgoing interface. I think there’s a bug here in how global discovery uses the dialer, as it doesn’t even try to use the sync port for discovery announce. I can reproduce this locally.
I think the cause might be that the listener is of type tcp4 while it needs a more general tcp socket to dial https://rsync:8443 (there’s nothing to say that’s an IPv4 address). Setting the listen address to tcp://:22000 probably makes it use that port.
I can reproduce this locally with the sync address set to tcp://:22000 though, it also says “Dialing direct” there. Discovery also ends up with a mess of irrelevant ports.
This is really bizarre. After startup, my devices (Linux + Windows) usually say “Dialing reuse”, but sometimes getting refused with “connect: cannot assign requested adress” (falling back to non-reuse), while a subsequent dial with “Dialing reuse” is successfull?!?. Then, it sometimes switches to “Dialing direct”, but when I change the listen address in the GUI it switches back to “Dialing reuse”. No idea what’s going on there, it flip flops in between working and not working.
I can confirm that setting just tcp4://::22000 in the listen address always results in Dialing direct, as the dialer uses [::] (IPv6 wildcard) as its outgoing address (presumably only if the system has IPv6 enabled). If listening additionally on either tcp6://[::]:22000 or just on tcp://:22000, then discovery does use the port reusing dialer, but not 100% reproducibly. Sometimes, sometimes not.
I’m just realizing that I may be confusing announce queries vs lookup queries: Only the aforementioned uses the port-resuing dialer, but both generate dialing chatter. That’s probably why it looks like it’s flip-flooping, it’s just two different discovery lookups. Still weird why the OS sometimes says “connect: cannot assign requested adress”, when it’s perfectly content with the address 5 minutes later.
TCP timers or something? Given port reuse, we need any previous connection for the same tuples to be gone, which may take a while. I could see repeated connection attempts to the same IP & port failing due to the previous closed connection still being around in TIME_WAIT. (Though, announcements should be about every 30 minutes. Your testing may have forced the pace somewhat higher.)
I said routed, not local area network, so yes, eventually I settled with static addresses, giving up on stdiscosrv, strelaysrv, stun (stun server - ran via a third party project called eturnal - and relays were not really needed in my case, but I just tested them anyways so I can get a general overview and understanding of the syncthing architecture).
However still after reading the explanations I’m not convinced it is the right thing to do to announce the zero port in the particular case when NAT is explicitly disabled in client configuration - i.e. we declare we are reachable directly by IP address (assuming ofcourse the firewall permis the incoming connections and we are able to bind to the listen port) without needing port mappings on any router on the path to discovery server or other devices.
What’s disabled is UPnP and NAT-PMP (i.e., a couple of NAT traversal techniques). I have those disabled because they add nothing in my environment, because my NATing firewall doesn’t speak them. It’s not a switch to say “I promise there is no NAT in my network”.