glusterfs vs. syncthing

Thomasp · June 14, 2025, 8:41am

I have finally found a configuration that works for me. What I actually did:

(a) switched from ubuntu-syncthing to the officially supported syncthing version (v1.29.7, Linux (64-bit ARM))

(b) set Max Folder Concurrency to 16 (much higher than you would normally do with spinning disks)

c) use the Tuning tipps to optimize the metadata operations (my settings below), I think there is further optimization potential

Volume Name: wdVolume
Type: Distributed-Disperse
Volume ID: 5b47f69a-7731-4c7b-85bf-a5014e2a5209
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x (2 + 1) = 6
Transport-type: tcp
Bricks:
Brick1: tPi5Wb.my.name:/mnt/glusterBricks/extWd18a/data
Brick2: 192.168.129.9:/mnt/glusterBricks/extWd18b/data
Brick3: tPi5Wb3.my.name:/mnt/glusterBricks/extWd18c/data
Brick4: tPi5Wb.my.name:/mnt/glusterBricks/extWd5a/data
Brick5: 192.168.129.9:/mnt/glusterBricks/extWd5b/data
Brick6: tPi5Wb3.my.name:/mnt/glusterBricks/extWd5x/data
Options Reconfigured:
storage.linux-io_uring: on
server.event-threads: 16
performance.rda-cache-limit: 1Gb
performance.io-thread-count: 64
performance.quick-read: on
performance.io-cache: on
performance.read-ahead: on
disperse.stripe-cache: 10
disperse.other-eager-lock: off
disperse.eager-lock: off
cluster.lookup-optimize: on
performance.cache-max-file-size: 1MB
performance.qr-cache-timeout: 600
performance.xattr-cache-list: security.*,system.*,trusted.*,user.*
performance.cache-samba-metadata: on
performance.nl-cache-positive-entry: on
performance.nl-cache-timeout: 600
performance.nl-cache: on
performance.parallel-readdir: on
performance.readdir-ahead: on
network.inode-lru-limit: 200000
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
disperse.shd-max-threads: 16
performance.cache-size: 1024MB
features.scrub: Active
features.bitrot: on
storage.fips-mode-rchecksum: on
transport.address-family: inet
cluster.disperse-self-heal-daemon: enable
diagnostics.client-log-level: DEBUG
storage.build-pgfid: on

I’m quite happy with this, thanks for your ideas and input.

Kind regards Thomas

imsodin · June 14, 2025, 9:19am

For your use-case, I’d even set it to unlimited (-1). Or otherwise leave it as default (0), which then corresponds to numbers of cpus of your server. I don’t see think more concurrency can hurt with a remote filesystem like this.

The very first thing on that link is enabling metadata caching - if it doesn’t do that by default, that seems very likely to be the main change that helped a ton. I don’t see the metadata-cache string from that operation in your settings though, but there are some other cache related ones - probably that just manifests differently in settings?

Anyway, glad to hear things work for you now.

Thomasp · June 14, 2025, 9:55am

Yes, this is probably the most useful setting in general for such a usecase (16 was my choice to make sure there is enought concurrency).

I’m not sure, if this would work, because I do not use much CPU during the critical scanning phases. Possibly, this would lead to infinit scanning. Someone should test this idea.

Yes, I executed this command, but there is no special output in gluster volume info ... for this.

calmh · June 14, 2025, 10:33am

If that fs is case sensitive, enabling the corresponding option in Syncthing will also significantly reduce the number of metadata operations.

Thomasp · July 6, 2025, 9:36am

After some days of normal operation, I got more and more I/O-errors. Today, I droped GlusterFS and switched to ceph. We will see where it goes …

Thank’s again for your ideas and your support.

mraneri · July 6, 2025, 12:03pm

Ceph has been solid for me. My only note is heed the warning if you didn’t see it that you shouldn’t run the kernel mode driver if you’re mounting the file system on the ceph node itself. Use the FUSE user mode mount instead. If you’re mounting the file system in a different machine then you can use the kernel driver.

I don’t know why this is, but I didn’t notice this and did have some fs issues which I was able to recover from and haven’t had any problems since switching.