large number of files performance

sakoula · August 30, 2019, 6:15am

Hi There!

So I am a Dropbox user and Resilio Sync user (tuned it to work over ssh tunnels without relays). I know of syncthing for a while now but I have not tried it yet. My aim is to have 3 computers with a common folder syncing between them. Usually I am working on only one of them. I have a couple of questions:

How ‘heavy’ is syncthing. One of the things that makes me reluctant to use it is that I have read that it ‘uses’ a lot of resources. What is your feedback?
Also how responsive is on big changes on the shared directory. I am used in having my working copies of git repositories (sometimes big ones) on the shared folder which translates on many changes on many files. Is syncthing going to handle this without any problem (e.g. Dropbox is having very good performance on it)

Thanks for your help!!!

calmh · August 30, 2019, 6:44am

Your questions are rather subjective. I suggest you try it and judge for yourself.

sakoula · August 30, 2019, 6:56am

Hi! Thanks for the reply!

I know that my questions are subjective

However I am looking for other people experiences on using syncthing. I am definitely going to try syncthing. It is my this coming weekend project!

Catfriend1 · August 30, 2019, 7:51am

Hi,

Well, git repos: Syncthing is not for Syncthing databases or a bunch of files from VCS, so my advice would be to disable watch for changes and then set a high interval for scanning the folder (1hour, 4 hours…). Anyways , if a change to the git tree is made involving a lot of Files and the scan Happens in between you might get an inconsistent view on the sync partner side temporarily.

imsodin · August 30, 2019, 9:27am

As @calmh wrote, this is subjective, so no better way than to try it out, but as you expressively asked for opinions, I can give those (notice the maintainer note next to my username though, I don’t claim impartiality):

Very light. Obviously the initial scan will be heavy (it needs to hash everything with a strong cipher) and then comes the initial sync, which if the data is already synced still requires the different clients to work out that they are indeed in sync - last time I did this it was still slowish, so there you might doubt my “light” label. Give it a moment and then comes the shining part: Syncthing just runs in the background without me ever noticing it does and I sync everything between a homeserver, laptop and phone (well phone obviously not everything due to storage restrictions). I just noticed recently I by accident synced a huge database/cache directory - I removed that because I don’t need it synced, but still, it didn’t clog up anything else.
Did that early on, because I thought I’d only ever access it on one device. Then I once did something on the other device and soon enough some time later found inconsistent git states due to conflicts. Since then I happily ignore .git in Syncthing and use git push/pull to sync repos, as git is designed to do. Syncing databases and stuff like .git may work, but is very delicate and likely to break (think taking backups of dbs on disk, you’d take a snapshot first - Syncthing cannot do such a thing).

sakoula · August 30, 2019, 10:04am

Hi!!!

Thanks again for the great feedback! One thing I was thinking of doing is keeping my non-bare repositories on the shared folder and then push/pull from there. Is this something that will work?

In any case I need to try it

imsodin · August 30, 2019, 10:32am

Good

In case of TL;DR: Just read Audrius response below

I assume this is to be able to work on an unclean git directory on any device and when finished push from any device to a central repo, correct? It might work, but as mentioned before it’s not recommended as it’s fragile. I’d rather embrace feature/work branches. So if you switch computers and have unfinished changes you don’t want to commit to your base branch, create a new branch (e.g. with a -wip suffix to the original branch) and push that. Then pull on the other device, do some more work and once you are finished, squash merge to the base branch. This even sometime saved my *** by being able to revert to an earlier commit, that when committing I would have never considered as a potential recovery point. And you are certain you have a consistent state.

AudriusButkevicius · August 30, 2019, 10:55am

Use git itself to sync between different machines, using syncthing to sync git is the wrong way around, and I recall people coming to complain on the forum when their checkouts got corrupted.

sakoula · August 30, 2019, 11:30am

Thanks again!!! One last question. Currently I use ResilioSync between 3 computers syncing over ssh tunnels and it seems to perform fairly well (for my usage). I saw that on documentation there is a section on ssh tunnels. Is there something I should be careful when using syncthing with tunnels?

Thanks!

Nummer378 · August 30, 2019, 12:06pm

First, yes please read the documentation on tunneling over SSH. Second, consider if you really require SSH tunnels. ~~While they may work just fine (with syncthing), tunnels that do TCP-over-TCP is something I personally avoid as much as possible (“TCP meltdown”)~~ (just saw that modern SSH applications can mitigate this by only transferring application data). Syncthing has other means to achieve direct connections, so it may be possible to avoid the usage of such a tunnel, unless you have specific requirements (strict firewall etc) that require the usage of (SSH) tunnels.

calmh · August 30, 2019, 12:41pm

Yeah… Clearly it’s not going to help performance (twice the encryption and encapsulation), and I’m not sure under which circumstances it’s really necessary. I guess it would mostly be for philosophical reasons (you love and trust ssh, and hate port forwards and don’t trust our relays, and refuse to run one yourself, for example)?

sakoula · August 30, 2019, 1:41pm

hi again! first of all I really appreciate all the detailed replies. Thanks! Well to be honest I do not want ny traffic to go through intermediate relays for security (although I completely agree that my argument is kinda of weak). This is why I want to run the traffic over ssh tunnels. The encryption over encapsulation over tcp etc in practice it was never a problem for me (except one case that I used a proxy over ssh feeding a nodejs instance that was screwing things up).

Running a public relay myself, I have not thought of it but why not, let me read a bolit more on the documentation

system · September 29, 2019, 1:41pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.