Starting a new topic, since this is sort of an outgrowth of my earlier UTF-8 topic.
I’ve done more experiments with my folder that has tons of small files. I removed the folder from Syncthing and then re-added it to only one machine. Once that machine finished scanning the folder (which didn’t take that long, actually), the CPU went down to normal load.
I then shared the folder with one machine. The CPU has stayed reasonable on the machine that currently contains the folders, but the machine that’s trying to sync the folders has been at 100% CPU since the sync started. Sync has been going for almost 24 hours now, and Syncthing is reporting the following statistics:
Global State 67009 items, ~2.08 GiB
Local State 39259 items, ~316 MiB
Out Of Sync 27750 items, ~1.77 GiB
As you can see, in this scenario, syncing happens very slowly, and the machine that’s trying to get the files has its CPU pinned. Both machines are on the local network.
I’d like to understand what’s happening here. What causes the CPU on the machine that doesn’t yet have all the files to run so hot? If I were to add another machine to the folder share, would syncing happen faster? If I were to wait until these two machines were synced and then add another machine, would that machine sync faster?
-help for profiling instructions.
These could be useful for us to understand what the hell is happening.
Started with the profiler, but I’m getting 404 not found. This machine is my server, which is running openSUSE, so the package is a bit behind (0.10.27). My other machines run Manjaro and are on 0.10.29.
I think it should just produce extra files in the config folder every 20 or so seconds.
Nothing seems to be getting created there either. It definitely picked up the environment variable, though:
[N5SUI] 12:25:18 DEBUG: Starting profiler on 0.0.0.0:9090
Okay, I’ve got a little bit more data. For some reason, all syncing stopped, about midway. I name all my machines from Star Trek, so bear with me here. I have a server and two laptops that want to sync this folder. The server is ds9, running openSUSE and Syncthing 0.10.27 (the openSUSE repo is a bit behind). The source folder is my personal laptop, enterprise-e. The other laptop is my work machine, enterprise. Both laptops are running Manjaro Linux and Syncthing 0.10.29. The Manjaro repo is a bit more up to date than the openSUSE one.
DS9 was trying to receive files from enterprise-e, the source machine. About midway through (Global State: 67009, Local State: 39548), syncing stopped, for two days, with no messages in the logs. Since I couldn’t get the debugger to work yesterday on ds9, I decided to add my work laptop (enterprise) into the mix today. When I added the folder on my work laptop (enterprise), I started getting these messages:
Apr 02 12:32:32 enterprise syncthing: [TDSQ7] WARNING: Puller: final: rename /home/rsezov/emu/amiga/AmiKit/Classes/DataTypes/.syncthing.icon.datatype /home/rsezov/emu/amiga/AmiKit/Classes/DataTypes/icon.datatype: no such file or directory
Apr 02 12:32:50 enterprise syncthing: [TDSQ7] INFO: Connection to N5SUIQ4-KR2PWTP-I2H6HV7-KN4CNAY-XJENNPI-MEUPO42-JYGZP5J-ALTZ7AQ closed: read tcp [fe80::2e0:4dff:fe9e:1e12%wlp8s0]:22000: connection reset by peer
Apr 02 12:33:05 enterprise syncthing: [TDSQ7] WARNING: Puller: final: rename /home/rsezov/emu/amiga/AmiKit/Classes/Gadgets/.syncthing.scroller.gadget.uaem /home/rsezov/emu/amiga/AmiKit/Classes/Gadgets/scroller.gadget.uaem: no such file or directory
Apr 02 12:33:07 enterprise syncthing: [TDSQ7] WARNING: Puller: final: rename /home/rsezov/emu/amiga/AmiKit/Classes/ToolbarImages/Default/.syncthing.7seg_offdot /home/rsezov/emu/amiga/AmiKit/Classes/ToolbarImages/Default/7seg_offdot: no such file or directory
Apr 02 12:33:40 enterprise syncthing: [TDSQ7] WARNING: Puller: final: rename /home/rsezov/emu/amiga/AmiKit/Classes/ToolbarImages/Default/.syncthing.addressbookglobal_g.uaem /home/rsezov/emu/amiga/AmiKit/Classes/ToolbarImages/Default/addressbookglobal_g.uaem: no such file or directory
These hidden files indeed do not exist. Anybody know what’s happening?
It seems the directory disappeared while we were syncing…? The temp file was there, but then the directory went away?
The only thing that could’ve happened is Syncthing restarting on either DS9 or enterprise-e, because I was also adding someone else to a separate shared folder.
Interestingly enough, my little experiment worked. Between enterprise and enterprise-e (both running 0.10.29), they seemed to figure out the temp files issue, and all three machines have started syncing. I then stopped Syncthing on enterprise, just to see if enterprise-e and ds9 would sync again, and they are. So something with the temporary files between the two stopped all syncing from happening with no logged messages for a while.
What I have noticed, though, in all instances, is the machine that has the files has low CPU usage (which goes up when it has to share with multiple machines). The machine(s) receiving the files have high CPU usage. Does that make sense?
The machine receiving verifies blocks received (that they match the hash), the sending machine just reads files so no load at all.
Regarding 3 devices stopping to work, no clue, but if you have 100% reproducible steps, I guess we could fix it.
Okay, so basically, when you have lots of small files, comparing their cryptographic hashes takes as much processing power as it would for very big files. So what you see is very little bandwidth used and lots of processor on the receiving side.
Regarding the 3 devices, it was only two that stopped. When I added the third device, somehow those rename errors surfaced, Syncthing figured out what the problem was, and syncing began again among the original 2, as well as with the third. So I’m not sure if I have anything reproducible at this point.
All right; last night, the two original machines (enterprise-e and DS9) finished syncing. If I add enterprise to the cluster today, will it sync faster if all three machines are online, or if only the server (DS9) is online?
If all three devices are online, enterprise will get the data from enterprise-e and ds9 simultaneously.
Normally this would mean, that the transfer is faster. But as you stated yesterday, when transferring many small files, the receiving device gets high CPU. Because of that it can be, that you will not see a real difference, as the CPU of enterprise will be the limiting factor.
Oddly enough, when I tested the transfer speed with a single big file, the sending device had full CPU load, while the receiving only had reasonable load.
Well, because of the different capabilities of the machines, (DS9 is old and slow), enterprise does seem to be syncing faster, and is only at 50% CPU, while poor DS9 is pinned at 100% again.