All goes well for a while, the two instances are connected to each other, syncing perfectly, but after a while (I can’t associate it to an event) one of the instances CPU usage goes up to 100% and disconnects. The log stops here:
Apr 17 12:27:04 VM syncthing[81427]: [YSBJ4] DEBUG: addresses: [quic://46.139.64.79:1033 quic://46.139.64.79:22000 relay://163.172.44.215:22067/?id=AFN6YPK-MA5PKES-HT5T7AC-DTKET23-VAHTBZ7-CCDOAYY-7XZSR4X-MQWJ4A4&pingInterval=1m0s&networkTimeout=2m0s&sessionLimitBps=0&globalLimitBps=0&statusAddr=:22070&providedBy=XLU.SH relay://88.99.175.206:22067/?id=L5BVTMV-ICUMRZL-MHQEHNC-LDM4LEE-YIAQWA7-VYAICZ2-MWD4O2Q-CQ2APQB&pingInterval=1m0s&networkTimeout=2m0s&sessionLimitBps=0&globalLimitBps=4000000&statusAddr=:22070&providedBy=effekthasch tcp://46.139.64.79:22000]
Apr 17 12:27:04 VM syncthing[81427]: [YSBJ4] DEBUG: Reconnect loop for GPINCSM-ZFINXRS-3H2PKNC-UHSWD4M-TYLKFSY-ZLKO4JZ-2GSUEQ5-4QZYOAN [quic://46.139.64.79:1033 quic://46.139.64.79:22000 relay://163.172.44.215:22067/?id=AFN6YPK-MA5PKES-HT5T7AC-DTKET23-VAHTBZ7-CCDOAYY-7XZSR4X-MQWJ4A4&pingInterval=1m0s&networkTimeout=2m0s&sessionLimitBps=0&globalLimitBps=0&statusAddr=:22070&providedBy=XLU.SH relay://88.99.175.206:22067/?id=L5BVTMV-ICUMRZL-MHQEHNC-LDM4LEE-YIAQWA7-VYAICZ2-MWD4O2Q-CQ2APQB&pingInterval=1m0s&networkTimeout=2m0s&sessionLimitBps=0&globalLimitBps=4000000&statusAddr=:22070&providedBy=effekthasch tcp://46.139.64.79:22000]
Apr 17 12:27:04 VM syncthing[81427]: [YSBJ4] DEBUG: Not dialing using Relay Dialer as priority is less than current connection (200 >= 200)
Apr 17 12:27:04 VM syncthing[81427]: [YSBJ4] DEBUG: Not dialing using Relay Dialer as priority is less than current connection (200 >= 200)
Apr 17 12:27:04 VM syncthing[81427]: [YSBJ4] DEBUG: dialing GPINCSM-ZFINXRS-3H2PKNC-UHSWD4M-TYLKFSY-ZLKO4JZ-2GSUEQ5-4QZYOAN tcp://46.139.64.79:22000 prio 10
Apr 17 12:27:04 VM syncthing[81427]: [YSBJ4] DEBUG: Dialing no fallback result tcp 46.139.64.79:22000: <nil> socks connect tcp localhost:8010->46.139.64.79:22000: unknown error general SOCKS server failure
Apr 17 12:27:04 VM syncthing[81427]: [YSBJ4] DEBUG: dialing GPINCSM-ZFINXRS-3H2PKNC-UHSWD4M-TYLKFSY-ZLKO4JZ-2GSUEQ5-4QZYOAN tcp://46.139.64.79:22000 error: socks connect tcp localhost:8010->46.139.64.79:22000: unknown error general SOCKS server failure
Apr 17 12:27:04 VM syncthing[81427]: [YSBJ4] DEBUG: failed to connect to GPINCSM-ZFINXRS-3H2PKNC-UHSWD4M-TYLKFSY-ZLKO4JZ-2GSUEQ5-4QZYOAN 10
Apr 17 12:27:04 VM syncthing[81427]: [YSBJ4] DEBUG: dialing GPINCSM-ZFINXRS-3H2PKNC-UHSWD4M-TYLKFSY-ZLKO4JZ-2GSUEQ5-4QZYOAN quic://46.139.64.79:1033 prio 100
Apr 17 12:27:04 VM syncthing[81427]: [YSBJ4] DEBUG: dialing GPINCSM-ZFINXRS-3H2PKNC-UHSWD4M-TYLKFSY-ZLKO4JZ-2GSUEQ5-4QZYOAN quic://46.139.64.79:22000 prio 100
The GUI is still responsive and is shows the other instance as connected, which is wrong and no files are synced anymore.
I have to restart Syncthing just to start over but after a while the problem repeats. The problem happens on both sides (not at the same time) seemingly not depending on any of the others.
Yes, htop shows one CPU core driven at 100%. If Syncthing were scanning, it would show something in the logs I guess (especially at debug level), and it also would not disconnect. Also, the CPU load stays permanently 100% for hours until I stop it. Graceful SIGTERM does not do, I have to SIGKILL it. Moreover local SyncthingGTK can not connect to the process, is just shows an empty window. To my best knowledge, nothing has been changed in the configs since the previous version which worked flawlessly. I don’t think that the scanning triggers the load, since it can work several hours normally and I have left the Full rescan interwall at default 3600 for every folder.
Try running a CPU profile or capture and post the logs around and after the time it goes into this 100% cpu usage cycle (or ideally the whole log leading up to that point).
I went to http://localhost:8384/rest/debug/cpuprof and waited for about an hour, but no file download was offered (I checked and it works otherwise when Syncthing is working normally). Finally I gave up and sent a SIGQUIT to Syncthing. I saved the full log of that run but please can I send it privately, it contains lots of personal data I don’t want to expose publicly here.
Thanks for that idea.
I stopped Syncthing, deleted /home/user/.config/syncthing/index-v0.14.0.db on both ends and started again. Let’s see it in the long run.
The metric “it gets to 100% cpu” is not really helpful: If you remove the database it needs to hash all your data - 100% cpu is expected (good) behaviour for that. Please describe actual problems you’re having. And if you want to be sure that the cpu usage is because of scanning (though there’s no reason stated to doubt it), enable scanner debug logging. I am ok with having a look at logs sent privately to me, though there comes no guarantee of any kinds regarding support with that
Syncthing scanned and hashed my data and rebuilt the database. During this time it used the CPU but never at 100% let alone constantly.
Syncthing was done hashing, reconnected all peers in the meanwhile and all became quiet.
After roughly 10 minutes Syncting started to drive one core of the CPU at constant 100% according to Htop. Nothing is logged, there is no file to download at localhost:8384/rest/debug/cpuprof.
I tried to list all details about what happened. This same scenario repeats itself after a restart over and over again randomly at both sites. I have other Syncthing instances with the same version connected to those instances that do not provide that behaviour. Only those two act like this. One is behind a corporate firewall, connecting through a SOCKS proxy, the other is at my home behind a router. No manual port forwarding is used, the instances simply find each other through external relay servers.
I now stopped the instance beind the firewall, and left the home instance running. Several hours passed with no problems so far. I’ll wait for a day or two to see how it goes.
Looks to me like it’s processing network data of some kind (the runnable routines) and is otherwise idle. But this is also a build not made by us, using a Go version we haven’t switched to yet, so I’m not sure. I’d suggest trying one of our binaries from GitHub, just for fun.
AudriusButkevicius wrote:
One of the routines is quic, so try our build first and if that does not help, you could try setting sync protocol listen address to tcp://0.0.0.0:22000 instead of default.
I have replaced /usr/bin/syncthing with your binary (from syncthing-linux-amd64-v1.4.2) on both ends and left them running for a day. I can happily report that everything is normal, Syncthing syncs data and consumes hardly any CPU cycles.
Should we conclude then that the new version of Go (1.14.2) used in Archlinux was the culprit?