I have a scenario that I would like to sync 4 or more locations that have an nfs mount point with about 2TB of data and users add files daily which need to be replication in all locations. Ideally I would like the files to go as fast as possible to all location hence I would want to have all 4 locations connected to each other and try to get the new file block so the transfer is faster so I thought I would add a 1 second check.
The files are usually small ~5MB and there are no deletions.
Would you suggest Syncthing for this scenario ? I’m worried about the time that syncthing will take to check the whole nfs share for new files and some of them are in a directory which means listing takes quite some time.
I did see inotify but since the files are alot I thought it wouldn’t be a viable solution because fs.inotify.max_user_watches.
correct me if I’m wrong
Well you can correct/increase that.
what would happen if I try to add 100 files in the shared directory ?
Would the 100 files try to sync at the same time ? I read somewhere that syncthing would make blocks out of them but how many block of how many files would it sync at the same time ?
Also, I’m reading that inotify watches a folder and notifies syncthing. Can I do the notification by some other application ? Can you give me an example ?
The nodes request the blocks from the other nodes that have them. In the advanced settings you can adjust the number of pullers to change the number of requests.
Usually a node will pull blocks for the first few files in the list at a time. You should set up a couple of nodes and do some testing.
Also have a search in the forum for pullers.
what about the inotify ? Can I just disable syncthing sync interval and call the api of all other syncthing nodes in order to update a specific file or folder ?
You can tell Syncthing to scan a specific file or folder via API.
That’s what syncthing-inotify does.
There is no sync interval but a rescan interval. Syncing is done immediately as a change occurs. You don’t have to (and can’t) tell the other nodes anything, just you local node to scan a file/folder to check for changes.
If you setup inotify you will not need the scans often but they still occur to ensure changes don’t get missed.
When the local node is told to scan a file/folder by inotify it scans and updates it’s local state the sends the updated state to all of the devices that are currently connected to it and a part of that share.
The other devices read through the new state, determine what has to be done to match it then start pulling blocks or deleting files to achieve the new state.
a couple of questions more,
what happens if one big file is added to the sync path?
do all clients get it from one server or clients will try to get parts from the nearest server or the server that has the part that it doesn’t?
example, I have 2 clients the same region and I add a file to one of them 50GB. The client in the same region will get the file quickly but will the rest of the clients in other regions download from both of them or only one ?
also, I would like to disable sync interval but notify the syncthing clients through the api that a new file is available in this client. Is this possible ?
There is no sync interval. As soon as the local node is aware of a change it notifies the remote nodes and they start requesting blocks.
There is a scan interval which is how often syncthing looks for changes locally. Inotify can be used to monitor file system changes almost removing the need for scheduled scans but they still occur in case something is missed.
An update not all that long ago enabled nodes to share blocks before the entire file is downloaded. This means nodes can start pulling from the second node in the region before it has even finished downloading the file.
which version is this change that downloads from multiple client ?
since I always know when a file is added to a client can I do the notification of the rest of the clients my some other application through the syncthing api ?
No. Every Syncthing client sends the blocks is has to all connected devices as soon as it downloaded it and added it to its index database. API can only be used to tell Syncthing to check a specific file or folder for changes.
You’re getting confused. People have already explained this, but I’ll try again.
There are two parts.
The first is how your syncthing instance knows that files on your computer have changed. This had nothing to do with other machines or other syncthing instances: this is just your syncthing instance, which is watching your files. Syncthing periodically scans your files looking for changes. This is configured by the scan interval. In addition you can use inotify, which asks the filesystem to tell syncthing straight away when a file changes, using the API. This means that, as soon as you change a file, your syncthing instance notices.
That’s the first part.
The second part is that once syncthing knows that one of your files had changed, it tells your other syncthing instances STRAIGHT AWAY. There’s no interval or API here: If your syncthing instance is connected to another one, it will tell the other one that one of your files has changed AS SOON AS IT KNOWS. The other syncthing instance will then start fetching that file straight way. No delay.
Again, two parts. The first is syncthing noticinf that your filed have changed. This can take a little while and relies on periodic scans, but you can use inotify, which uses filesystem notifications on the API, to help spees it up. The second is that once your syncthing instance had noticed that your files have changed, it tells all the other syncthing instances straight away, and they start fetching those changes straight away.
All versions of Syncthing will fetch a file from multiple places at the same time if possible, as far as I know.
Thanks for the explanation Antony,
What I need to do it disable the periodically check of syncthing on the file system and do what inotify does by myself through another application that knows when a file is changed.
so user would contact my api and say that he wants to upload a file. User uploads the file and when the file is finished this api will contact syncthing to say to scan that folder/file ( not the whole syncthing directory ) and syncthing would automatically update the rest of the clients if it sees that the folder/file has changed.
This is correct but shortly after delta indexes were added, AudriusButkevicius (I think) implemented transfer of incomplete files.
Previously the entire file had to be present on a node before it would announce blocks were available. Now blocks are announced immediately so nodes can help to seed changes before they have the entire file.
You can have a look at how the inotify implementation works.
You will still have an initial scan etc.
Why not use inotify and upload to a staging directory. Then on upload complete move the file to the Syncthing directory.
That way any API changes are handled by the inotify package and you don’t have to modify your implementation.
I would like to avoid using inotify since it needs kernel configuration changes and since I already know when my files are changed.
I have another question though.
What would happen if I have syncthing on three nodes and two of them share the syncthing directory which is also in nfs mounted on both.
node1 : nfs:/srv/share
node2 : nfs:/srv/share
node3 : ext3:/srv/share
If I upload a file (node1:nfs:/srv/share/file1) to node1 and node2, they would get it immediately since its on nfs share. How would syncthing react to this when :
node1 sees the change and notifies all other nodes about the change. Node2 already has the file so how would it react ?
both node1 and node2 see the change at the same time and notify node3. Would this be a race scenario and node3 would get the latest or … ?
thank you for your responses and patience
I think, that changes on node1 and node2 will be no problem. it shouldn’t matter if they see it both at the same time. They scan, they send index update to the other two nodes, if the file is already there with the same content/blocks and meta data it should be considered done*
But changes on node3 will cause both node1 and node2 to create the same temp file and fight each other over that file.
TL;DR: You shouldn’t do that.
*it could be, that you get conflicts or “infinite” puller errors, if syncthing doesn’t rescan the file before trying to apply the received changes.