optimizing large data sync


#1

Hi, I’ve been a long-time syncthing user and have donated in the past to keep dev going. I’m wondering how to optimize my setup. In my experience so far, syncthing works far better than rsync or other solutions for the number of files and volume of data with which I work daily.

I sync a few servers to one central server with 250TB of disk. It’s all been working well for a year. I’m using the default config for 1.0.1 and it seems to work well.

The central server (hereafter known as z) is running pretty idle and otherwise doing fine. All servers are spread around the world, but on full gigabit connections and I can routinely average 860 mbps between them.

A basic diagram is a --sendonly–> z. b --sendonly–> z. c --sendonly–> z. Or said another way (a…y) --sendonly–> z.

Some data size samples:

a is syncing 22,649,455 files in 42,560,953 directories totaling 2.79 TB.

b is syncing 22,342,732 files in 42,481,303 directories totaling 28.4 TB.

c is syncing 60,058 files in 46,980 directories totaling 34 GB.

d…n servers are more like c.

a and c sync fine without issue and are generally at 99%-100% anytime I check them.

However, b is constantly lagging in the sync. The scanning period can take days to complete and sometimes only syncs a few TB and seems to go back to scanning. I’ve played around with Full Scan Interval a few times, but no setting seems to matter.

I’ve read these forums for months and through the various config and advanced config options. Nothing really jumps out at me as an obvious improvement. This is especially true given a and c are working just fine. Servers a and b have nearly identical hardware.

Added note: most of the servers are ubuntu 18xx. z and some servers are freebsd 11 or 12. In this case, b is ubuntu 18.10 and z is freebsd 11. Both OSes are fully patched.

Any obvious pointers?

Thanks!


(Audrius Butkevicius) #2

As usual, check the logs.


#3

Yes, I’ve done that. There are no errors and it syslog on b looks just like syslog on a and c, etc.


(Audrius Butkevicius) #4

Presumably Z would be the one pulling down stuff from B, so I’d check the logs of Z and see that it actually sees that it needs to download files.


#5

Here’s what’s in the log for Z:

"[ID] 21:04:18 INFO: syncthing v1.0.1 “Erbium Earthworm” (go1.11.5 freebsd-amd64) teamcity@build.syncthing.net 2019-01-18 10:34:18 UTC

[ID] 21:04:18 INFO: My ID: ID

[ID] 21:04:19 INFO: Single thread SHA256 performance is 406 MB/s using crypto/sha256 (403 MB/s using minio/sha256-simd).

[ID] 21:04:19 INFO: Hashing performance is 282.25 MB/s

[ID] 21:04:19 INFO: Ready to synchronize “file” (server-a) (receiveonly)"


(Audrius Butkevicius) #6

Right, but I was saying more towards checking the UI, wether it thinks the folder in question is out of sync locally or not.


#7

After much more debugging, it seems there are a few things:

  1. The web interface shows “Scanning” of the directory being shared to z.
  2. b is the oldest syncthing installation I have, so it’s been through a number of upgrades via apt. At one point in the “.config/syncthing/index-xx” directory, there were ~280,000 files.
  3. b seems to keep losing the connection to z. no other server has this issue. From the OS-level, I’m able to tcp connect reliably over 8 hours while syncthing reports hundreds of disconnects.

Since barely anything was actually transferred between b to z, I figured it’s more efficient to start over.

I’ve since unshared the directory, wiped .config/syncthing and re-setup the entire connection to z and reshared the folder.

After 8 hours, b is still “Scanning…” but the connection between b to z is solid. Everything is running 1.0.1 now.

Thanks for the feedback so far.


(Audrius Butkevicius) #8

The connection breaking should result in error on one side or the other.


#9

A quick follow-up, the server b is still “scanning”, but has maintained connection to z without issue. It appears having tens of millions of directories just takes super long amounts of time. It’s been 10 days or so now. I’m just letting it run and eventually it should finish scanning and start transferring data to z.


(Vincent Ardern) #10

You’re working with far more data than I’ll ever have so I find this pretty interesting to read about. Please update us when you discover that it has finished scanning :slight_smile:


(Bt90) #11

Just curious but what is your storage configuration? (HDD/SSD,RAID,filesystem)


#12

The config on “b”, which is the problem server, is 10x HGST HUS726060ALE610 setup in software raid 6 on ubuntu server running ext4 filesystem. The system was never designed for this load, but due to various reasons, is now this way. From investigating iostat and other disk tools, the disks aren’t the bottleneck, likely ext4 and whatever insane coder decided tens of millions of directories is a sane setup. I’ve inherited the system and make do with what I have.

In fact, one of the bottlenecks to retiring this server is the lack of data transfer. Once the data is on another server, I can replace it with a modern ssd-based disk array. I don’t fault synching here, as rsync wasn’t much better, so it’s just wait and wait and wait.