Synchronization of large folders

terry · April 28, 2021, 8:38am

If the server has a USB3 port, what about a portable SSD drive for the db?

Folder concurrency is ok on paper for large setups but in reality doesn’t work well (sorry devs) since if a folder doesn’t have focus it essentially does nothing, no syncing, archiving, anything. In an ideal world (and has been mentioned before), the scanning should be limited, but all sync operations allowed to carry on as if concurrency = 0

But your issue is totally down to the db. I have around 40 folders / 4m files / 15Tb, some hanging off portable drives and on a restart the C drive (with SSD) will go to 100% for an hour or two until the scanning is complete. Under a HDD it will never complete as the changes will happen faster than the db can update.

Dix · April 28, 2021, 9:04am

Fragmentation is fine. Thank you. As I understood from the description, blockPullOrder affects the file transfer. I have no problem transferring files, there is a problem with freezing folders in the scan state.

Dix · April 28, 2021, 9:18am

The use of USB is not allowed by internal policies.

I can also tell my story about paralllism. Initially I tried to set up sync with 1 huge folder, inside 30 million files, 1700 folders, 16 TB of data. A month later, 15 million files were scanned. After that it was decided to try to parallelize the folders, of course, through Restapi added 1500+ folders, a week later 29 million files were scanned. And there are only 2 large folders left. In my case, the result is much better.

imsodin · April 29, 2021, 9:30am

Changing block pull order to in-order should not improve anything. The default already ensures that blocks are pulled in contiguous batches.

Did you check memory then? That sounds like a case where you might run out of memory/start swapping due to the scan status updates (which can be disabled).

Dix · April 29, 2021, 9:50am

Yes, when setting up 1 huge folder, there were crashes due to RAM, about 1 time in 3 days.

tomasz86 · April 29, 2021, 11:08am

When is inOrder beneficial then? If not here, then is there any situation, where it should actually be used instead of the default one?

imsodin · April 29, 2021, 11:33am

As far as I am concerned never. However as always: There’s probably some use-case out there where it is - that I don’t know it doesn’t mean much

tomasz86 · April 29, 2021, 12:07pm

I see…

From the Docs, I have always got an impression that inOrder is better than the default one for HDDs…

If there is no actual difference in practice, then it would probably be nice to have a note there (just not to bother in vain).

AudriusButkevicius · April 29, 2021, 12:18pm

I guess for tiny files (but bigger than one block), it might be better, because it might avoid a single seek operation?

But to be honest, at this point, it’s all going to the write cache anyway.