glusterfs vs. syncthing

I am running a syncthing instance on a Raspi 5 which is part of a gluster cluster. I use the glusterfs as storage for syncthing. Without the syncthing service running, I am perfectly happy with the performance of glusterfs (100+MB/sec), but when I start the syncthing service, the performance drops to almost stall (1MB/sec and less) after some time, which eventually means that syncthing never finishes scanning.

I see almost no network, disk or CPU activity (5+GB free RAM).

A few seconds after I stop the syncthing service, the performance is back to 100+MB/sec.

I have no idea where to start my search for performance, any ideas are welcome.

I’ve posted a similar question to the Gluster mailing list to see if glusterfs might be to blame.

The other two syncthing instances are a win11 in the local LAN and a ceph based linux instance via VPN, the Win11 and the linux-ceph instances sync perfectly well.

1 Like

How long did you wait to let Syncthing finish its initial scan? Roughly how many files and how much disk used for what you’re scanning?

Hi,

It runs in this state for at least 6-8 weeks or even longer. I was experimenting a lot, I can’t tell exactly.

But if syncthing does nothing on Net/Disk/CPU, I can wait until infinity.

root@ceph-3:/mnt/tCephFs# du -sh ./syncthing/
18T     ./syncthing/

root@ceph-3:/mnt/tCephFs# find ./syncthing/ -type f | wc -l
737829

Kind regards Thomas

1 Like

What does Syncthing claim it’s doing? (Screenshots are worth a thousand words. Expand the folder in question…) Generally, a scan is made on startup (metadata check), which can take a while for network attached storage. I wouldn’t be surprised if your I/O performance is reduced for the duration; in fact, I’d very surprised if it weren’t. That shouldn’t take 6-8 weeks though.

If the filesystem is case sensitive, checking that box will increase your scan performance significantly.

1 Like

It claims to scan, those are all static data (it does not make sense to change anything before it reaches a steady state). But with 1MB/sec filesystem performance, the rescan interval of one folder overtakes the scan process of another folder. This screenshot shows three scanning folders, if we wait a few days, we can see four, five or six …

Even if I change nothing, it never reaches a “all up to date” state.

GlusterFS is case-sensitive, the case-sensitive flag ist ticked on all folders.

Where’s your Syncthing database located? If it’s also on your glusterfs, that might explain the thrashing.

1 Like

It’s scanning for metadata changes. It’s set to do so every four hours.

You’re also running a version that doesn’t exist, of presumably questionable provenance. Maybe you should ask for support where you got it.

1 Like

The database is in /var/syncthing/... on a ZFS.

The syncthing version is the default version that comes with Ubuntu (almost the same one I use with the CephFS). I’ve also thought about switching to a newer/official syncthing version. Maybe I’ll give that a try, but I don’t really like the idea because it makes updating the OS less convenient.

Wherever it comes from, version 1.27.71 is not a thing that ever existed.

3 Likes

As calmh already pointed out: You are scanning the folder every 4 hours, which according to your description takes a very long time without any changes. That means it’s not actually accessing data, so your typical/achievable throughput on glusterfs doesn’t matter. And it tracks with low resource (cpu, memory) usage, because most likely it’s just doing cheap syscalls that are very slow, as glusterfs is slow on metadata operations. Which isn’t exactly unexpected for any network based filesystem, even more so one designed as “cloud storage”. I was hoping I could link to docs that do recommend to use syncthing on the same device as the storage, not on a networked filesystem, but unfortunately there doesn’t seem to be any - could make a good FAQ, this comes up every now and then.

For completeness, though I doubt it’s related to the issues here: That’s definitely better than glusterfs (or any network based FS), but might still be slow-ish depending on what kind of ZFS setup and what is the backing storage. Generally the faster the storage for the database the better.

1 Like

I switched to v1.29.6, Linux (64-bit ARM), but the problem stays mostly the same.

Without syncthing running:

root@tpi5wb3:/mnt/wdVolume# dd if=/dev/zero of=./dummy.dat bs=1M count=10240 status=progress
10700718080 bytes (11 GB, 10 GiB) copied, 105 s, 102 MB/s
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 105,463 s, 102 MB/s

when I start syncthing:

root@tpi5wb3:/mnt/wdVolume# dd if=/dev/zero of=./dummy.dat bs=1M count=10240 status=progress
10736369664 bytes (11 GB, 10 GiB) copied, 767 s, 14,0 MB/s
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 767,112 s, 14,0 MB/s

Well, what imsodin said. Starting syncthing means it doing a metadata scan. That means putting load on your network filesystem.

The more interesting question, IMO, is if the initial scan can complete when running a supported version.

I don’t have an RPi 5 to do any testing with; I recommend that you start by attempting to sync a much smaller (10% of what you have now?) number of files and see if the initial scan can complete. If it completes within a few days that would be a good start for you to understand what you can expect in terms of performance with this environment.

Isn’t glusterfs kind of notoriously slow for metadata operations? Seems optimized for fewer large files rather than many many files.

Maybe I’m wrong.

glusterfs in general is a pretty rubbish file system.

A B , the creator of Gluster, was once my boss. I don’t know if he’s still involved with it, though.

Ok, I will use strace to find out something about these metadata operations. There are also some performance tuning tipps on the glusterfs website to improve metadata operations.

If you were a little more specific, perhaps someone could benefit from your knowledge.

The usecase was sharing thumbnail cache between multiple ec2 machines in the same az, serving an e-commerce site, before efs was a thing.

The performance of the site tanked really badly due to how long io and discovery took on files in glusterfs, way beyond of what was acceptable.

The cross site (resd eu->us) gluster sync was also a pile of dirt, failing with obscure errors and getting stuck quite often.

This is just first hand experiences and I don’t want to touch it ever again.

1 Like