Compressing small files together?

Is Syncthing able to “glue” (i.e. tar & zip) small files before transferring them?

I was using Syncthing to copy a large folder to a new machine. Basically 50% of the files are large raw photos (say >20mb each) and each file has a companion xml (a few kb each). I noticed that Syncthing was taking an insane amount of time to transfer the xml (over wired LAN). I solved the problem by running find ./blah/ -name "*.xmp" | tar -zvcf all_xmp.tgz -T -, then I let Syncthing transfer the archive (which took one or two seconds), manually unpack on the other side and let Syncthing rescan the folder.

The final rescan took a while, but deducing from the transfer speed, this workaround probably saved several hours of download.

Not really, but with such files you could set compression to all data in device settings. I personally have a similar problem, because I deal with thousands of small text files. Transferring 50,000 tiny text files can take hours, but when zipped into a single archive, it becomes a matter of seconds.

This is nothing to do with compression, but to do with fsync.

Syncthing fsyncs files after syncing them to make sure they are persisted, before recording that in the database.

On some hardware/filesystems, fsync(filename) is equivalent to fsync() (whole disk), which can take seconds as all buffers have to be flushed, so you end up with a 1s pause between every file.

That is the cost of not losing your data, effectively.

1 Like

Thanks for the answer.

However let me point out that my workaround is not to be underestimated: after decompressing the archive I let Syncthing synchronize the folder again, e.g. I didn’t just assume that the two were in sync: in fact these files could have changed in the meanwhile.

So I’d think that this is a process that would be worth automating: I would believe that before starting the actual file transfer, all Syncthing endpoints know precisely the list of items to be transferred and their size, and they may want to TGZ (or whatever) all small files, send a single archive, let the receiver unpack the archive, fsync once, then restart the sync process.

Fsync takes file handle as a parameter, so you’d have to call this for every file, and end up taking the same amount of time.