An opportunity arose at work to test out syncthing. Syncing 15.9 million directories holding 85 TB of data between two servers across the Internet. Would it be faster to attach a disk array to one and then ship it to the other? Yes. But what fun is that?!
The source server is 24 core, 128GB of ram, with 24x7200rpm SATA drives in a software raid 6. The target is 32 core, 256GB of ram, with 20x enterprise ssd drives in a software raid 6. Both servers are AMD EPYC cpus. Both servers are connected at 10 gbps and can sustain 5 gps between each other over direct ipv6/ipsec vpn connection. The servers are configured to connect directly, rather than through relays. We’re stuck with the hardware we have, and had a limited time window to optimize the hardware and operating system.
So far, it’s working. 32TB is successfully transferred in the past month.
Syncthing, unison, and rsync all encounter the same problem: trying to enumerate the directories before syncing. Syncthing is consuming between 4-8 cpus at 100% all the time, between scanning the top level directory and actual data transfer. Scanning is set to 999999 seconds to avoid the scan penalty. The source server is constantly maxed out on disk i/o where the receiving server is real busy with write i/o. The receiving server has the added cost of writing the md array as well as the stream of data from syncthing.
Both servers are using ext4. We found the hardware is the limiting factor more than filesystem. We also found tuning the linux kernel to handle the i/o worked the best. The other limiting factor is likely the SATA bus on the sending server. We found swraid is faster than hardware raid so far.
I’m not sure how to make syncthing use more cores or if that would even help. There’s been suggestions that Go is just not fast enough, but unless we’re going to rewrite syncthing into something else, it’s a moot point. I think the hardware and Internet are the biggest bottlenecks, not the software at this point.
Anyway, this is the largest data we’ve tried to transfer between two servers. Syncthing is cranking away without complaint.
The company donated to syncthing last year as thanks for the awesome software.
Why is that “the problem”, i.e. what was the actual issue/slowdown?
If it was indeed that the first stage of scanning (enumerating files to scan) took very long, you could have disabled that by setting scanProgressIntervalS to -1.
Anyway now that it is syncing, that shouldn’t matter anymore.
What might have helped is setting databaseTuning to large in the beginning.
Generally: On what medium is your db stored?
syncthing, unison, and rsync try to load the directory structure of 15.9 million directories into memory to figure out how to optimize the data transfer. In syncthing, I think it’s walkfs. While this probably works great for average usage, for edge cases like this one, the “walk” takes forever. If there was some way to do a partial walk, or only tell it to walk one subdir at a time, then maybe that’s more efficient. I don’t know, just thinking about it.
The secondary issue is the sheer ram consumed by keeping the walked directory structure ready in memory. We have lots of ram in this case, but oomkiller keeps killing one of the syncthing threads because it gets too big. As the system runs low on ram, oomkiller gets to work. We’ve implemented a work around for this to stop oomkiller from targeting syncthing processes/threads.
For comparison, rsync segfaults when doing the initial directory walk.
And really, I’m entirely impressed syncthing “just works” for the most part with minimal configuration changes from default. It’s pretty amazing
Uh, I just realized I went immediately down the sceptic road in my previous post:
I am actually quite happy to read your account of syncing a huge amount of data and making it work. The sceptic respectively investigator comes out because I’d like to also see the pain points addressed
We’re dealing in edge cases here. I’m pretty sure that the “average” syncthing user doesn’t have this size files and directories to sync. For the further edge case, in this 15.9 million directories are hundreds of millions of files. The directory structure is basically 26 chars deep (/a/b/c/d/e/f/g/h…) in order to keep “only” tens of thousands of files per directory. I think you’ve explained what I’m seeing.
However slow it may seem, it’s 1) working, 2) minimal config. It’s still pretty amazing in the end.
In fact, for personal use, I sync around 1 TB of video/photo media from desktop to NAS without issue. I didn’t do any special config, just joined the two machines together, shared a folder, and voila, it works.
Because I don’t know of a way to do this reliably, on all our supported platforms, while keeping in mind there may be any mix of file systems under the folder mount point.
The default is fine and works perfectly on ext4, it’s just that there’s a low but non zero performance cost. That doesn’t matter for almost anybody, but when you’re syncing a gazillion files on slow drives it might matter.
Oh, and the database
is very much not doing you any favours. The walk process is, essentially, a linear process of listing a directory, stat:ing each thing in that directory, and looking it up in the database. Each of those will incur a seek-read cycle on your array, in effect reducing it to one large 7200 RPM spinner. It’s going to be really, really, slow. Moving at least the database to a local SSD would probably do wonders. Database latency is a killer.
(Some of the database will hopefully live in RAM cache, but who knows. It’s presumably large and accessed very randomly.)
I agree. We had a short timeline and I got tired of supervisord restarting rsync every few hours, so I tried syncthing. There’s a lot I could do with more time and budget. However, a poor paraphrase is “we sync with the servers we have, not the servers we want”.