Hello, I have read a few threads on people saying syncthing’s files are slow to access presumably because they were fragmented. Now I have one data point confirming the above but it is purely anecdotal and not very objective. Wikipedia defines three types of fragmentation (Free space fragmentation being outside the domain of syncthing)
- File fragmentation: file not incontiguous.
- File scattering: related files not being grouped together.
I have identified the following advanced folder options as possibly conducive to reducing fragmentation based on discussions in several threads. Please let me know if I’ve done my homework correctly.
tomasz86 says use "inOrder here. Syncthing and defragging, also missing files - #12 by tomasz86. Has the potential to affect “file [level] fragmentation”, the first of the three types of fragmentation.
blockPullOrder: My understanding
blockPullOrder controls how a folder will distribute portions of a file to connected peers. (Or I am misinterpreting it and it decides how this folder will fetch files from peers). Regardless…
New file exists locally, one or more (N) peers wish to download it. Seeder will carve up this file up into N blocks and give peers a piece. Everyone pulls a piece, then randomly pulls another piece from each other. Fragmentation (for peers): scales upwards with number of peers. The more peers the more their files will be broken up into blocks of 1/N of the file. This means there will be no fragmentation if there is only one peer.
Like the above but instead of carving file up into N blocks based on number of peers, blocks are distributed randomly. Fragmentation (for peers): maximum fragmentation, all files broken up into blocks of random order.
File distributed sequentially to peers. Fragmentation (for peers): Mimimum. Everyone pulls a file sequentially.
inOrder will cause the least fragmentation. But people who want to get a file pushed out as fast as possible and/or their connectivity is unreliable may find inOrder to slow the syncing down since there are reduced oppurtunities for peers to share unique blocks between each other. There may also be disk strain for the seeder because of the above, from not being able to outsource disk seeks to other peers.
copiers and pullers (hashers not relevant)
The documentation says these are advanced options and state not to touch them. Copiers also seem relevant w.r.t. file-level fragmentation.
Tyrindor uses copiers=1 in “sent through Syncthing have very slow reads from disks?”. This advice is quoted from calmh saying to set copiers=1 in response to “Slow sync with large files” (not related to fragmentation)
copiers and pullers: My understanding
Copy data blocks from one file to another (e.g. “reused” files in syncthing terminology"). Presumably the more copiers working together simultaneously, the greater than chances of file fragmentation. 0 being the default, which is explained by calmh as equal to 2 although earlier in 2015 calmh stated the default was 1 and there was no reason to have more than one
Request missing blocks from the network. Default being 16 as explained by calmh. I don’t understand fully pullers but they seem a network-related optimization and not so much related to putting files on disk and it’s best to leave this untouched.
copiers and pullers: Conclusion
Copiers=1 seems reasonable in reducing file fragmentation but tuning pullers above my pay grade, defaults are probably there for a reason.
controls how many folders may concurrently be syncing or scanning, defaulting to the number of cpu’s in the system.
maxFolderConcurrency: My understanding
If multiple folders are syncing simultaneously, then that could potentially mean multiple files are being simultaneously written. Presumably setting this to a lower number could possibly reduce fragmentation, with 1 being an extreme. But that would severely impact the syncthing experience if only one folder were allowed to sync while all others waited on it. Firstly one would presume that filesystems would be smart enough not to weave two files together bit by bit. With what calmh says here in mind this seems like it is delving into deeper os/filesystem-level things like write caching and so on. It seems unwise to change this without a deeper understanding of how the OS, filesystem, the storage subsystem interact. Since I don’t have this knowledge the defaults probably are OK.
Keep the defaults
The order in which needed files should be pulled from the cluster. (random, alphabetic, smallestFirst, largestFirst, oldestFirst, newestFirst). This seems most related to the third type of fragmentation, file scattering.
This seems pretty self-explanatory. Syncthing, when pulling files that it needs, will sort what it pulls based on the above choices. Caveat: as stated in the docs that this refers to files syncthing already discovered, e.g. selecting for smallestFirst may pull a 1GB file ahead of a 2GB file, but it later turns out the rest of the folder is entirely 1MB files, just that they haven’t been picked up yet.
order: My understanding
Reduced utility of adjusting this due to the above caveat aside, outside of file fragmentation being the hardest on disks, file scattering should rank second highest. But it seems almost impossible to tune for without taking into account the myriad ways the user or the application uses files.
Are small files frequently accessed together? (smallestFirst/largestFirst). Are files accessed alphabetically? (alphabetic). Are older files left untouched and and newer files frequently updated? (oldestFirst to reduce free space fragmentation / newestFirst to short-stroke newer files in an area of higher rotational velocity on the disk).
Regardless of syncthing, it seems inevitable that given enough time files in a folder may not be near each other on disk and there’s no way to control the correlation between the the folder tree and location on disk. There doesn’t seem to be an ideal way to lay out files on the drive at all.
Keep the default of random
blockPullOrder = inOrder, reduces file-level fragmentation, may cause disk strain on sending device, may increase time to reach in-sync status. Most useful option to tune.
copiers = 1, reduces fragmentation at the cost of slowing sync time maybe
pullers = don’t touch
maxFolderConcurrency = could reduce fragmentation, best to let the OS/filesystem handle this
order = random (default) – has the potential to impact how files are scattered but impossible to predict usage