Best approach for large repositories?


I discovered Syncthing only a while ago and it realy looks nice and promising, it is also something I have been waiting for years to happen, so thanks a ton for developing it !

For me it is simply great, I keep my files tidy, organized and I archive what is not in use anymore (~20GB).

But I´d like to backup my girlfriends PC and that´s a different story…

She has a ~120 GB folder with 375.098 items. I do wonder if it will be reasonable to use Syncthing for that in the long term?

Untill now I was rsyncing two remote ends/repos, but manually and only from time to time, so it would be great if Syncthing allows me to keep our backups in a timely fashion.

It´s ok if the initial sync takes some resources but I don´t want it to grind the host machines all the time. What should I expect in this case?

Edit: by the way, I have her folders in place and set to scan every 2 hours (since I am making ‘one-way backups’ rather than ‘two-way syncing’), I just didn´t dare to take the plug yet.

Thanks in advance! MadOp.

PS: I´ve already read a few related threads, but many of the issues discussed there seem deprecated by the recent ST releases.

Only the initial scan should be heavy, then later it should be fine. Though if it’s a very weak device the initial scan can take very long. For example RPI would struggle with this sort of load.

Wow! That was fast :smiley: !

One end has a Core i3 and the other one an AMD E-350 processor.

I might move my files to something similar to a RPI in the near future, but that would only handle my 20 GB repo.

FWIW, while you where replying to me, I added that I´ve set the scan interval at 2 hours to minimize system loads.

Also take a look at the inotify add-on:

Then you can set the scan interval even lower.

That sounds great. Is there a reason why not doing a pull request of your add-on into Syncthing? That would be a nice improvement to the software. For example, in my case I have some very big repositories. They are not modified very often, so the scan interval can be low. On the other hand, when a modification is made on one of these repo, I would like to make it detected as quick as possible. So I should set a high scan interfal frequency…

I think your add-on could solve my “issue”.

There are many reasons, search for the issue in the tracker.

OK, got it:

1 Like

Hi there, thanks for the help so far.

I have just shared this large repo among the computers and it seems everything is seen as out of sync.

The CPUs were being pounded at 100% activity and my temp sensors were complaining so I stopped it for now. However it was already saying 31 GB are out of sync, when there should not be more than 500 MB at most.

I remember reading something about copying the index to avoid syncing the whole repo in the initial process, but it was an old-ish thread and couldn´t find it again…

FWIW the master repo is a NTFS drive mounted on Linux and the target repo is a NTFS drive mounted on Windows.

What am I missing?

Thanks in advance. MadOp

31GB is an approximation, rounding to the nearest 64kb per file. If you have millions of 1 byte files, it’s understandable.

You can rsync the files across to all machines perserving mtimes, and then just copy the indexes from once device which already has the folder indexed, which should make it more lightweight. Bare in mind, any other folders that are not part of that index will have to be rescanned.