Improving performance by limiting concurrent I/O

deroby · January 28, 2016, 2:25pm

Not trying to revive an old thread, but this seemed to most logical (and most recent) location to chime in on the I/O vs CPU bound issue.

Situation: I’ve got 2 Atom ‘file-servers’ here (low consumption but I still get to use a full windows on them). An N2800 (fairly fast, 2 cores, 4 threads) and an N230 (rather slow, 1 core, 2 threads). They are in different locations (read: 1 is at my dad’s house) and they keep our backups from CrashPlan in sync. My dads backups are sent to me and vice versa. If ever we encounter a ‘disaster’ (e.g. lightning strike) we theoretically should still have a valid backup ‘offsite’. Due to the way CrashPlan works the backups are ‘one per machine’ so in total we have 5 folders that need to be scanned upon startup.

Anyway, the Atoms are no performance monsters by a long shot so when Syncthing starts up the N2800’s 4 cores jump to 100% for quite a while to hash the close to 1Tb of backup data that goes up and down. However, a big part of that is spent on interupts waiting for I/O. Additionally, when looking at the disks (5400RPM spinning platters) I notice that the read-speed is ‘limited’ to 15-20Mb/s in the beginning but as more folders finish their scanning it goes up to 40Mb/s and more as the disk spends more time reading and less time seeking. I’m under the impression that doing it all sequentially probably will be faster (or at least won’t take much longer) and definitely will be easier on the hardware.

I’ve already set the ‘hashers’ option to 1 for each folder, but as you mentioned above this is a ‘per directory’ setting and each directory is handled independently.

Is there any chance we’ll see a ‘global’ max-hashers setting? I could try merging the folders but last time I tried that (ages ago =) I ran into out-of-memory issues with GO. I believe the latter has been fixed but regardless I’d always be stuck with at least 2 folders: one that holds my dad’s data (and is R/O on his machine) and one for all my machines here that then get replicated to his machine.

Any other tips? I’ve already upped the scan-interval dramatically (6 hours) but otherwise am out of ideas.

PS: not that I have any right or reason to complain btw, this thing works SUPERB! Many thanks for all the effort you put into this! (Not sure why people keep comparing it to Dropbox (I probably can’t afford 1Tb there =) or BTSync (buggy!). I very much prefer Syncthing!)

ProactiveServices · January 28, 2016, 2:43pm

If a folder has already completed its initial scan, subsequent scans don’t require data hashing - they just check file metadata such as last modified, file size etc. New data also has to be hashed for the first time. What you’re seeing is reading of metadata from disk and then comparisons with the database, which may explain the fact the I/O is not being seen as “sequential”.

There may be something to be said for scanning folders sequentially/queued, rather than simultaneously, as spinning rust really suffers under lots of random I/O.

deroby · January 28, 2016, 3:02pm

Specific to my case: it seems CrashPlan stores its data in data-files of (almost) 4Gb. So each time the backup has added something, the last file needs a full re-hash. Additionally it also does ‘pruning’ meaning it throws away blocks of files (or versions) that are no longer needed, freeing up space again for the new backup and again causing many more 4Gb files being touched. I have no statistics about this, but I’m under the impression this means Syncthing will quite often find many of those 4Gb files spread around over all directories related to the local machines marked for re-hashing. The ‘offsite’ directories are only updated by Syncthing itself, so I agree that just a meta-data scan should probably do there.

PS: with ‘on startup’ I meant ‘after not having run for a couple of days’. I’ve installed syncthing as a service, but for some reason it sometimes stops and I’ve been too lazy so far to really figure out why and what to do about it (instinctively I’m blaming the wrapper). Whenever I think about it I simply check and if needed start the service again and then I notice the described behaviour.

ProactiveServices · January 31, 2016, 4:14pm

So lots of new data to hash, which will make the resource usage unavoidable. Is there any way to have CrashPlan use smaller files?

deroby · January 31, 2016, 7:37pm

I don’t think so. But having Syncthing read the files sequentially would probably be less taxing on the hardware, hence me bringing it up. I’ll probably have more luck asking for a global max-hashers setting here than getting a change CrashPlans codebase =P

calmh · January 31, 2016, 9:45pm

I could see a global “folder concurrency” setting, possibly. Meaning how many folders can be non-idle at any given time. Although you’ll run into things like a two hour sync operation blocks scans on older folders in the meantime, then… Maybe just a limit on the number of scanning folders.

calmh · January 31, 2016, 9:47pm

Moved to a new topic because I don’t like the old one

deroby · February 1, 2016, 9:56am

Perfect =)

I understand that the option would cut both ways indeed. Having just 1 hasher active at a time could potentially result in the entire system becoming much less responsive to small changes all over. Then again, in the grand scheme of things it might actually be faster to quickly do things sequentially versus trying to do things in parallel and trashing the hardware in doing so. It very much depends on the CPU vs I/O bottleneck and I doubt it’s going to be easy to make a magic ‘one fits all’ solution there. I’d certainly stick with the current behaviour for the default configuration!

I think it’s Total Commander that had a similar problem where copying from drive x: to drive y: could be done either using big buffers or small buffers. Big buffers works great if x: and y: are on the physical disk, while small buffers are more efficient when we’re talking about two different disks. To get around that they allowed somewhere in the configuration to group drive-letters and if you tried to copy files from one group to another it would use small buffers, but for operations within the group it would use big buffers thus taxing the drive-heads (a lot) less. I think they’ve offloaded this to the OS in the meantime, but the problem they had at the time was similar and I remember that their solution had major effect when copying between partitions on the same disk.

All in all, it’s not something that should be on top of the list of new options (e.g. a scheduler to change configuration settings like max up/down-speed probably rates much higher), but if you ever need to work on the related code it might be something to keep in mind then. Maybe I should “simply” get a copy of the code and give GO a go.