I’m using Syncthing (v0.14.38-rc.1) in the following scenario/Use-Case:
- Files are created on a folder in machine A (aprox. 5K files per minute).
- Files are synchroized to a folder in machine B, where they are moved to a different folder
- When files are moved on B, they are removed from A
- The files are being synchroized to one folder only
- Machine A has and B are Virtual Machines (VMWare 12 Pro), with 2 CPU’s (1 core), and 4GB of RAM, on a windows server 2012 R2
With this scenario, as soon as machine A folder reached aprox. 10K files, the syncthing throughput degradates imenselly, at the same time increasing CPU usage to 80% levels on machine A.
In order to increase performance, i’ve used the following settings:
- Machine A folder re-scan is 30s (so files can be synched with a reasonably delay)
- Machine B folder re-scan is 300s.
- Ignore permissions flag true.
As performance is not acceptable, i tried to use Syncthing-inotify.
Although the performance is a bit better, it still degrades too much when the folder reaches 10K files, and CPU usage is also very high in machine A.
What do you suggest to increase syncthing performance in this scenario, as right now, for our use-case, and with our scenario, performance is not acceptable?
Thank you very much in advance.
I’m not sure what you mean with “reaches 10k” - after two minutes, that is? Otherwise I’d start by looking at top or the equivalent - is the cpu time “user” or “system”? Many file systems aren’t very well optimized for directories with lots of files in them and may make file operations very slow for that reason.
Set progress update interval and scan progress interval (advanced folder options) to
-1 to disable some tracking and accounting that you don’t need.
If it still seems slow, grab a CPU profile and we can see what conclusions we can draw.
Note that Syncthing doesn’t remove files from the index database. If the file names are unique each time (say, time based) you will grow the index by 5000 files each time. This will reduce performance over time.
Yes, it can reach 10K after 2m.
In a worst case scenario, we are expecting having 5 million files per day.
Yes, the file names are unique, which according to your response, would mean that would have 5 million values each day.
Is it still possible to achieve a good throughput (as we need the files in machine B as soon as possible), in this scenario?
I would expect it to be fine, so I’m not sure what the bottleneck is.
You can probably manage index growth by simply resetting the index at some reasonable time once a day / week / whatever, as the steady state seems to be empty folders on both sides.
When syncthing is started, the throughput is very good, aprox 100 files per second.
After a few minutes, as folder A starts to have lots of files, the throughput decreases to less than 5 files per second.
Can you please inform me how to reset the database indexes?
To be honest, you are putting the carriage in front of the horse. If you want to move files from A to B and delete on A, just do that, there is no “keep things in sync” activity here, and you are simply using the wrong tool for the job. Just NFS mount Bs storage on A, or send data to B directly from A via TCP or whatever you want.
Thank you for your reply.
By the way, i’m not addressing the slow performance to syncthing, but, as you can see on my previous posts, to our specific use case.
I understand that in this scenario, we are using syncthing for a slight different purpose.
Nevertheless, at the moment we cannot change to a different tool, so what settings could i use to “adapt” syncthing to this scenario?
Would distributing the files to multiple folders instead of one, improve the performance?
Well check what syncthing is constrained on, check cpu (user and kernel), memory usage, io on both sides.
Also, the files must be tiny to be 5000 a minute.
Yes, at the moment we are on a testing phase, and we are using very small files. On a real production environment, it will range form small to very large files.
Could you please clarify 2 specific questions?
1 - What is the objective of the disableTempIndexes variable?
According to the documentation:
By default, devices exchange information about blocks available in transfers that are still in
progress. When set to true, such information is not exchanged for this folder.
Nevertheless, i’m not understanding what is its purpose.
In my specific scenario, does it offer any advantage?
I’m asking because when setting this variable to true, it seems that the synchronization is faster, and syncthing starts consuming less CPU.
2 - Is there any difference, from a resources (CPU and RAM usage) perspective, in this two scenarios, for a load of 50.000 files:
a) all 50.000 files are in one single directory,
b) the 50.000 files are distributed through 100 directories.
Does scenario b) offers any increase in performance, and decrease in consumed resources?
Or its exactly the same?
Thanks in advance.
Thank you for the quick response.
Regarding the index database, is there a way to force the removal of its information?
-reset-database command line switch, but if you have files on both sides you might end up with conflicts.
This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.