Currently trying to sync a big medias directory over Internet hosted on a dedicated hosting service containing 13 Millions files, 900 GB of data to a new location (AWS).
On Source : medias directory is located on a NetApp device which shares the files to 3 servers (I will named them server srcA, srcB and srcC) though NFS, the code of the application using this medias directory can modify files in this dir thoughout any of these 3 servers and modifications (adds/modifies/deletes) are pretty much regular.
On Target : medias directory needs to be transferred/synced to an EFS mount on AWS and I have access to an EC2 ( I will named it dstA) which mounts this EFS through NFS.
I began first with a rsync try but the first sync took more than 1 week and the delta sync took more than 12 hours … so gave up with rsync because we need to move these data with the less service downtime possible.
So I tried with Syncthing by running a Syncthing daemon on srcA and on dstA and shares the big medias dir in 10 parts which represents the hash in this big dir (subdirs at 1st level named 0 → 9) in order to have more parallels threads running to sync to dstA, so 10 different shares in Syncthing.
The analysis part took a lot of time (don’t even remember how much …), after this, some of the 10 shares got Synced and I have seen the main problem :
when the application modifies a file from srcA, the Syncthing Daemon intercepts the modification with inotify watcher and properly syncs the file to server dstA : OK/Working, this is what I need
but when the application modifies a file on server srcB or srcC, the Syncthing Daemon DOES NOT seem to intercept the modification because it is done outside of the server srcA on which Syncthing Daemon is running so dstA is still displaying Synced status but this is false as some files are not transferred until a Full Rescan will occur on Syncthing Daemon on server srcA
So my idea was to add 2 others Syncthing Daemon on servers srcB and srcC to catch every changes of the application on the media directory, but it seems I have now another problem : e.g the shares on srcB and srcC source servers are now marked as Desynced when a change occurs on srcA, and the destination share on dstA will now display a “Local Adds” status because it gets a modification synced from srcA but not known from srcB and srcC, probably because the internal Syncthing database for others Synchting daemon is not aware of this new file …
So the question is simple but answer is probably not I think :
How can I manage/achieve this type of synchro with Synchting please ? Is there a way please ?
I don’t really follow, but it sounds like you have the same files managed by Syncthing in several places, both directly and over NFS. Don’t do that, do one or the other.
Thanks for answering and sorry if my explanations were not clear, I will try to sum up :
One directory with files located on a central NetApp, regular changes on this directory.
3 servers on source platform with read/write access to this directory over NFS.
1 server on destination platform on which I need to synced the source directory with the less downtime.
modifications done on 1st server on which Syncthing is running are properly seen and synced to the destination : OK.
but modifications done on 2nd and 3rd server are not seen by Syncthing running on 1st server because IMHO I think that inotifies watchers could only be seen on the server on which the modification is done, so files added/modified/deleted on 2nd & 3rd servers are not synced to destination platform : NOK
This is the problem
Edit : I also have to precise :
Syncthing daemon configured as Read Only on source.
Syncthing daemon configured as Received Only on destination.
Right, the appropriate solution here would be to run Syncthing on the file server, where the files are present locally and any changes should be visible via notifications. I don’t know if this is a “thing” though (running stuff on a NetApp). Otherwise I think you will need to use periodic scans as change notifications are not available over NFS.
Serving the same directory three times from different servers is not appropriate and will result in conflicts etc.
Already think of this but NetApp uses a proprietary OS called OnTAP with limited access to filesystem and no way to run 3rd party software like on Unix/Linux OSes
Periodic scans are not a solution for us as it is by far too slow
If there is no other solutions to achieve sync with Syncthing I will have to try tunning parallel rsync or rclone to lower the time for deltas, it would probably be faster then running a Syncthing with periodic scan in this case
Another question/interrogation please :
is it possible to run 2 Syncthing daemon on 2 different servers but managing the same internal database (index-v0.14.0.db) or not please ?
No, you can’t do that, but there should also be no point to doing it. If the bottleneck is listing directories and multiple concurrent rsync processes are faster than a Syncthing scan, then that’s something we could improve.
Ok, I was thinking of this to “workaround” the fact that inotify of one server wasn’t known for the other servers but if it’s not possible Thks for answer.
We have a lot of small files in this directory, I have disabled compression totally to improve speed but now thinking if encryption is not also slowing down the whole thing, is there a way to disable it please ? as the sync transfer is already done through SSH Tunnels linking the servers each other so directly by TCP so no need to have another layer of encryption.
No, you can’t disable encryption, but it also has no effect at all on the scan speed. (I would turn it around and say you don’t need the SSH tunnels, but again, it doesn’t affect scan speeds.)