proper cfg needed: changing big logfiles synced from lowspec host

rdslw · March 23, 2024, 1:06pm

Hey community

I’m looking for configuration tweaks for a folder which due to machine being lowspec, has problem with multiple “hashing: file changed during hashing” because scanning takes time, while underlying files change.

I’m seeking for ideas with emphasis on syncthing configuration.

Folder description:

synced from one, lowspec machine, configured as Synconly
contains around 10GB, in 20 files, 4 of them being logfiles from 2024 year (growing constantly):

A.log: 582 MB: almost 7MB per day
B.log: 238 MB: 3MB per day
C.log: 192 MB
D.log: ~80M MB

other 3 receivers are configured as ‘receive only’
files in folder are typical TEXT log files, which are written to FS, every 30seconds … few minutes, with one/more lines added
machine, has also other folders
full scan takes 4 minutes, and will reach around 10-12min in December (logfiles rotate yearly)

Current settings:

periodic scans: 20m,
watch for changes: disabled

In general syncthing manages to sync the files, with varying success, and receivers have data with 10…100 minutes delay (randomly).

Solutions I know, hence do not ask for them :

beefer machine → not possible.
ignore it
split files by month, not by year → last resort
writing process delays writes by 30seconds → cant increase it anymore

I’m looking specifically at solutions from syncthing cfg changes, as OS/hardware/writer cfg were already addressed.

mopani · March 23, 2024, 2:52pm

Do you have a filesystem (or can switch to a filesystem) that supports snapshots? Then have Syncthing sync from snapshots. That would “freeze” the state of the file so Syncthing can hash and share it. The difficulty I see is how to “roll” which snapshot Syncthing monitors… Maybe someone smarter has an idea.

rdslw · March 23, 2024, 3:13pm

I do in fact (btrfs). That would require probably pausing the folder and resuming after rolling.

Would I go into pause/unpause teritory, then many simple scenarios (sync from different directory, which on pause is overwritten with rsync/snapshot/plain cp approach) would work.

Thanks for the idea. Before applying it, looking forward some simple(?) CFG channges which would help syncthing.

At the moment I’m experimenting with:

disableTempIndexes>
weakHashThresholdPct>
modTimeWindowS>
hashers>

I wonder if those or any other syncthing settings, may help in my scenario.

gadget · March 23, 2024, 4:15pm

Would you mind sharing some details about the low spec machine?

You mentioned btrfs, so I’m assuming Linux (even though there is 3rd-party support for it on Windows). In particular, the following info would be helpful:

Linux distro
Storage medium (e.g. HDD, SSD, SD card,…)
Hardware specs (e.g. CPU, RAM,…)

rdslw · March 23, 2024, 4:35pm

This is Linux. Custom build machine with two core cpu is working with hdds, with btrfs, with embedded cpu, and is lowspec. Key information: Scanning/Transfer/Hashing from a syncthing perspective is a few MB/s.

In general I’m not looking for hardware/os/filesystem/file writer suggestion/changes. I do understand that they are obvious target but they were addressed as first step.

Here I’m curious if there are any possible syncthing changes/optimizations which may help.

tomasz86 · March 23, 2024, 4:46pm

Please check https://docs.syncthing.net/users/tuning.html if you haven’t already.

gadget · March 23, 2024, 5:03pm

Without knowing what the hardware setup is, it’s not possible to provide recommended optimizations for Syncthing because the choices entirely depend on it – a bit of a catch-22.

It could turn out that Syncthing isn’t the ideal solution for your particular use case. In which case, there might be other solutions that could be recommended, but that also depends on more details about the hardware setup.

AudriusButkevicius · March 23, 2024, 5:13pm

I really doubt this will work in general.

We scan the whole file every time it changes, and, by the time we’d be done with 500mb file at a few mb/s, the file would have already changed.

We’d send out the new hashes for the file, by the time other end comes to asking for the new blocks the file could have already changed, starting the whole loop all over again and not progressing anywhere.

Sure. There are things you can tweak to make the scan a bit faster, but it’s still 500mb of data at a few mb/s, so “tweaking” won’t get you that far, as majority of the time will be spent on actually scanning.

rdslw · March 23, 2024, 7:25pm

Yeah, end of the year (bigger files) will be problematic.

But currently syncthing is able to finish the scan multiple (>20) times during the day, and what I’m looking are those few more (small) tweaks.

I changed weakHashThresholdPct to -1 on this folder, to use faster hashing and are observing results.

After that I’ll be experimenting with hashers (probably 1) and maxFolderConcurrency (1) to speed it up even more.

What I’m also thinking about is changing general change observance mechanism, as only 4 out of 20 files are being modified in a given year (others are not changing due to nature: logfiles).

current: rescanIntervalS=“1200” fsWatcherEnabled=“false”
new: rescanIntervalS=“604800” fsWatcherEnabled=“true” fsWatcherDelayS=“1200”

Whats your opinion @AudriusButkevicius on such change?

AudriusButkevicius · March 23, 2024, 11:24pm

I think it will have zero effect. The best suggestion I can give is either get better hardware, or have smaller files, so there is less to scan when they change.

system · April 22, 2024, 11:24pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.