torture test IRL

andrew2 · January 29, 2021, 2:42am

An opportunity arose at work to test out syncthing. Syncing 15.9 million directories holding 85 TB of data between two servers across the Internet. Would it be faster to attach a disk array to one and then ship it to the other? Yes. But what fun is that?!

The source server is 24 core, 128GB of ram, with 24x7200rpm SATA drives in a software raid 6. The target is 32 core, 256GB of ram, with 20x enterprise ssd drives in a software raid 6. Both servers are AMD EPYC cpus. Both servers are connected at 10 gbps and can sustain 5 gps between each other over direct ipv6/ipsec vpn connection. The servers are configured to connect directly, rather than through relays. We’re stuck with the hardware we have, and had a limited time window to optimize the hardware and operating system.

So far, it’s working. 32TB is successfully transferred in the past month.

Syncthing, unison, and rsync all encounter the same problem: trying to enumerate the directories before syncing. Syncthing is consuming between 4-8 cpus at 100% all the time, between scanning the top level directory and actual data transfer. Scanning is set to 999999 seconds to avoid the scan penalty. The source server is constantly maxed out on disk i/o where the receiving server is real busy with write i/o. The receiving server has the added cost of writing the md array as well as the stream of data from syncthing.

Both servers are using ext4. We found the hardware is the limiting factor more than filesystem. We also found tuning the linux kernel to handle the i/o worked the best. The other limiting factor is likely the SATA bus on the sending server. We found swraid is faster than hardware raid so far.

I’m not sure how to make syncthing use more cores or if that would even help. There’s been suggestions that Go is just not fast enough, but unless we’re going to rewrite syncthing into something else, it’s a moot point. I think the hardware and Internet are the biggest bottlenecks, not the software at this point.

Anyway, this is the largest data we’ve tried to transfer between two servers. Syncthing is cranking away without complaint.

The company donated to syncthing last year as thanks for the awesome software.

AudriusButkevicius · January 29, 2021, 9:33am

I suspect there are parameters you could tweak to get it to use more cores, but also parameters that you could set to disable some features that might lead to better performance.

I guess if your disks are at 100% utilisation then the core use is moot.

imsodin · January 29, 2021, 11:12am

Why is that “the problem”, i.e. what was the actual issue/slowdown?
If it was indeed that the first stage of scanning (enumerating files to scan) took very long, you could have disabled that by setting scanProgressIntervalS to -1.
Anyway now that it is syncing, that shouldn’t matter anymore.

What might have helped is setting databaseTuning to large in the beginning.
Generally: On what medium is your db stored?

andrew2 · January 30, 2021, 6:33am

syncthing, unison, and rsync try to load the directory structure of 15.9 million directories into memory to figure out how to optimize the data transfer. In syncthing, I think it’s walkfs. While this probably works great for average usage, for edge cases like this one, the “walk” takes forever. If there was some way to do a partial walk, or only tell it to walk one subdir at a time, then maybe that’s more efficient. I don’t know, just thinking about it.

The secondary issue is the sheer ram consumed by keeping the walked directory structure ready in memory. We have lots of ram in this case, but oomkiller keeps killing one of the syncthing threads because it gets too big. As the system runs low on ram, oomkiller gets to work. We’ve implemented a work around for this to stop oomkiller from targeting syncthing processes/threads.

For comparison, rsync segfaults when doing the initial directory walk.

And really, I’m entirely impressed syncthing “just works” for the most part with minimal configuration changes from default. It’s pretty amazing

calmh · January 30, 2021, 7:43am

Ext4 on Linux, right? You should set caseSensitiveFS=true on these folders and skip some fs operations that otherwise happen on scan.

We don’t actually do that, though. Especially not with the progress things mentioned by Simon disabled.

imsodin · January 30, 2021, 8:00pm

Uh, I just realized I went immediately down the sceptic road in my previous post:
I am actually quite happy to read your account of syncing a huge amount of data and making it work. The sceptic respectively investigator comes out because I’d like to also see the pain points addressed

As Jakob already said above: Syncthing doesn’t do that.
The progress indicator I mentioned only stores files to be hashed.
The walkfs stores all directory children names of the currently walked path. I.e. when walking is currently at a/b/c/d the names of the (direct) children of a, b and c are stored in ram.
When syncing we store all the items to be deleted and all the files that need to be synced in RAM. That’s Excessive RAM usage when syncing large folders · Issue #4976 · syncthing/syncthing · GitHub and Alternative, more resource efficient puller · Issue #4107 · syncthing/syncthing · GitHub (possibly duplicates). There’s been an unfinished attempt at implementing disk-spilling for this.

andrew2 · January 30, 2021, 11:40pm

I have a dumb question then, why doesn’t syncthing detect the filesystem and set these parameters by default?

andrew2 · January 30, 2021, 11:44pm

We’re dealing in edge cases here. I’m pretty sure that the “average” syncthing user doesn’t have this size files and directories to sync. For the further edge case, in this 15.9 million directories are hundreds of millions of files. The directory structure is basically 26 chars deep (/a/b/c/d/e/f/g/h…) in order to keep “only” tens of thousands of files per directory. I think you’ve explained what I’m seeing.

However slow it may seem, it’s 1) working, 2) minimal config. It’s still pretty amazing in the end.

Thanks!

andrew2 · January 30, 2021, 11:47pm

In fact, for personal use, I sync around 1 TB of video/photo media from desktop to NAS without issue. I didn’t do any special config, just joined the two machines together, shared a folder, and voila, it works.

andrew2 · January 30, 2021, 11:48pm

It’s on the same raid array as the data. I’m aware the file i/o to/from syncthing db is competing with the file i/o to send/receive between servers.

calmh · January 31, 2021, 9:32am

Because I don’t know of a way to do this reliably, on all our supported platforms, while keeping in mind there may be any mix of file systems under the folder mount point.

The default is fine and works perfectly on ext4, it’s just that there’s a low but non zero performance cost. That doesn’t matter for almost anybody, but when you’re syncing a gazillion files on slow drives it might matter.

Oh, and the database

is very much not doing you any favours. The walk process is, essentially, a linear process of listing a directory, stat:ing each thing in that directory, and looking it up in the database. Each of those will incur a seek-read cycle on your array, in effect reducing it to one large 7200 RPM spinner. It’s going to be really, really, slow. Moving at least the database to a local SSD would probably do wonders. Database latency is a killer.

(Some of the database will hopefully live in RAM cache, but who knows. It’s presumably large and accessed very randomly.)

andrew2 · February 1, 2021, 10:23pm

I agree. We had a short timeline and I got tired of supervisord restarting rsync every few hours, so I tried syncthing. There’s a lot I could do with more time and budget. However, a poor paraphrase is “we sync with the servers we have, not the servers we want”.

andrew2 · February 1, 2021, 10:28pm

This is what syncthing looks like on one of the servers according to htop: