Incremental sync speed for large files NAS (ARM)

Well, I got a bit excited about this so I tried it with skipHashOnCopy=True on my 50GB VM image :smile: After 6 hrs I gave up. At that point the .tmp file was at 30GB. So I run another test on my outlook folder, but Iā€™m getting very similar results as before. In fact nearly identical. 4 GB file takes just about 30min. So really no difference at all. I can see with ls -la how the temp file is growing at about 2-3MB/s. And that is just about 30minsā€¦ FYI - I just extracted syncthing ā€œexecutableā€ from that build and replaced on my NAS box.

FYI 2 - config:

    <folder id="20_Outlook" path="/volume1/sync/20_Outlook" ro="false" rescanIntervalS="43200" ignorePerms="true" autoNormalize="true">
        <device id="E5JLCZW-HPENSAA-TPWWWKO-DZVZWGX-I3XXPJP-QRDON6B-N3XXXAE-ET63ZQB"></device>
        <device id="63FCKCZ-YWWWREG-TIJW4OV-IWLNX4G-XKXX2IE-2FKKSBO-XBXXXJZ-4FFILQM"></device>
        <versioning></versioning>
        <copiers>1</copiers>
        <pullers>16</pullers>
        <hashers>0</hashers>
        <order>smallestFirst</order>
        <skipHashOnCopy>true</skipHashOnCopy>
    </folder>

Save the config via ui and recheck if the option is still there, it might be an attribute on the folder tag (on phone canā€™t check).

It is there. I always do changes via GUI and then I just checked the config file to be sure. Anyway, funny enough, the second large file - archive.pst - 6GB went a bit faster at around 8.5 MB/s, so perhaps there is some improvement. So overall folder synced in 55mins as opposed to 1hr 20min yesterday. I will do run more syncs to see if there is any consistent improvement in speed or not.

Odd. So my next suggestion is creating a CPU profile. Preferably with the skiphash option on.

Run it as

STCPUPROFILE=1 syncthing

and let it do itā€™s thing for a while. Then stop it from within the GUI (Actions -> Shutdown), donā€™t ctrl-C it. Put the resulting cpu-something.pprof file somewhere, together with the binary you used to create the profile (itā€™s gone from the build server by now, and needed to interpret the profile).

saved here

I let it run ~1:20hrs to again sync outlook folder.

Profile shows it spending most of the time (1200 CPU seconds) hashing files that apparently have changed locally on the NASā€¦ Thereā€™s about 320 CPU seconds related to finding and copying blocks. So that doesnā€™t seem to be the bottle neck.

I donā€™t use NAS at all to modify files. Iā€™m the only user and NAS serves as media storage and now with syncthing as data mirror. However I synced 50GB VM image overnight. Strangely after completion CPU stayed maxed even though folder didnā€™t show ā€œScanningā€ nor ā€œSyncingā€. Even now that folder is unshared but CPU at 99%. I suspect it is still re-indexing that file. Bugger. I think that just ruined the test, Iā€™m sorry. I will wait until everything is done and try again. This is the current stdout: Iā€™d say 80_Windows_7_Personal is still indexing? Am I right? It doesnā€™t show ā€œcompleted initial scanā€, however GUI shows ā€œUnsharedā€ and that confused me, sorry.

 /usr/local/syncthing/bin/syncthing --home /usr/local/syncthing/var/
[monitor] 06:24:59 INFO: Starting syncthing
[63FCK] 06:25:08 INFO: syncthing v0.11.16+14-g403fccd (go1.4.2 linux-arm default) unknown-user@syncthing-builder 2015-07
[63FCK] 06:25:08 INFO: My ID: 63FCKCZ-YV3RREG-TIJW4OV-IWLNX4G-XKA52IE-2FKKSBO-XBXEOJZ-4JZILQM
[63FCK] 06:25:08 INFO: Database block cache capacity 8192 KiB
[63FCK] 06:25:19 INFO: Starting deadlock detector with 20m0s timeout                                                     
[63FCK] 06:26:33 OK: Ready to synchronize 90_Install (read-write)
[63FCK] 06:26:48 OK: Ready to synchronize 99_Personal (read-write)
[63FCK] 06:26:50 OK: Ready to synchronize 80_SILworX_Windows_7 (read-write)
[63FCK] 06:26:52 OK: Ready to synchronize 80_Windows_7_HMI (read-write)                                                 
[63FCK] 06:26:54 OK: Ready to synchronize 80_Windows_7_Personal (read-write)                                            
[63FCK] 06:26:55 OK: Ready to synchronize 80_Windows_Server_2008 (read-write)
[63FCK] 06:26:55 OK: Ready to synchronize 80_Windows_XP (read-write)
[63FCK] 06:26:56 OK: Ready to synchronize 80_Windows_XP_Clone (read-write)
[63FCK] 06:28:08 OK: Ready to synchronize 00_Work (read-write)
[63FCK] 06:28:10 OK: Ready to synchronize 10_Standards (read-write)
[63FCK] 06:29:28 OK: Ready to synchronize 50_Photo (read only; no external updates accepted)
[63FCK] 06:29:35 OK: Ready to synchronize 70_IOM_VMware (read-write)
[63FCK] 06:29:37 OK: Ready to synchronize 80_ELOP_II_Windows_7 (read-write)
[63FCK] 06:29:39 OK: Ready to synchronize 80_ELOP_II_Windows_XP (read-write)
[63FCK] 06:29:39 OK: Ready to synchronize 80_Ubuntu_3.04_x86 (read-write)
[63FCK] 06:29:42 OK: Ready to synchronize 20_Outlook (read-write)
[63FCK] 06:29:43 OK: Ready to synchronize 80_Windows_7 (read-write)
[63FCK] 06:29:45 OK: Ready to synchronize 80_Windows_7_Control_Maestro (read-write)
[63FCK] 06:29:48 OK: Ready to synchronize 80_Windows_7_Full (read-write)
[63FCK] 06:29:48 INFO: Starting web GUI on http://0.0.0.0:7070/
[63FCK] 06:29:54 INFO: Completed initial scan (rw) of folder 80_Windows_XP
[63FCK] 06:29:56 INFO: Completed initial scan (rw) of folder 80_Windows_Server_2008
[63FCK] 06:29:57 INFO: Completed initial scan (rw) of folder 80_Ubuntu_3.04_x86
[63FCK] 06:30:00 INFO: Completed initial scan (rw) of folder 80_Windows_XP_Clone
[63FCK] 06:30:03 INFO: Completed initial scan (rw) of folder 80_ELOP_II_Windows_XP
[63FCK] 06:30:09 INFO: Completed initial scan (rw) of folder 80_Windows_7_Full
[63FCK] 06:30:09 INFO: Completed initial scan (rw) of folder 80_ELOP_II_Windows_7
[63FCK] 06:30:10 INFO: Completed initial scan (rw) of folder 80_Windows_7
[63FCK] 06:30:10 INFO: Completed initial scan (rw) of folder 80_Windows_7_Control_Maestro                               
[63FCK] 06:30:11 INFO: Completed initial scan (rw) of folder 80_Windows_7_HMI
[63FCK] 06:30:11 INFO: Completed initial scan (rw) of folder 80_SILworX_Windows_7                                       
[63FCK] 06:30:15 INFO: Completed initial scan (rw) of folder 20_Outlook
[63FCK] 06:30:25 INFO: Completed initial scan (rw) of folder 10_Standards
[63FCK] 06:30:43 INFO: Completed initial scan (rw) of folder 70_IOM_VMware
[63FCK] 06:31:05 INFO: Starting local discovery announcements
[63FCK] 06:31:06 INFO: Device 63FCKCZ-YV3RREG-TIJW4OV-IWLNX4G-XKA52IE-2FKKSBO-XBXEOJZ-4JZILQM is "Jamaica" at [dynamic] [63FCK] 06:31:06 INFO: Device E5JLCZW-HPENSAA-TPSMWKO-DZVZWGX-I3IFPJP-QRDON6B-N3BUXAE-ET63ZQB is "WAPER-P0001" at [dynam[63FCK] 06:31:07 INFO: API listening on [::]:7070
[63FCK] 06:31:41 INFO: Completed initial scan (rw) of folder 99_Personal                                                
[63FCK] 06:33:13 INFO: Completed initial scan (rw) of folder 90_Install                                                 
[63FCK] 06:36:19 INFO: Completed initial scan (rw) of folder 00_Work
[63FCK] 06:37:10 INFO: Completed initial scan (ro) of folder 50_Photo

@calmh I made another profile, stored in the same URL above. This time there was no hashing/indexing. I started ST on NAS first, waited to fully start until CPU dropped to 0%. Then I started ST on PC and let them sync outlook folder. After sync was done, NAS CPU went to 0%, then I stopped it. Performance was again similar to as always, to sync 6GB file took 11min ~ 9MB/s. And from start to stop ST on NAS 55min. PC connected 13min after NAS ST start. So ā€¦ 55min nohash vs 120min hash.

Hi @calmh,

First of all, thanks for such a great piece of engineering, it is an awesome solution.

Regarding the in-place updates, I am thinking of a solution that hopefully would not require you to rewrite a lot of code, let me explain:

  • Instead of duplicating big files, you can store received blocks in a temporary file (even an sparse file if needed).
  • Once the receiving is done, and the data integrity of those blocks is verified, you can replace or append on the original file.
  • This method will not touch the original file until all data is in a local storage, so you donā€™t have to worry about networking issues.
  • In the case that the process fails at some stage then you could always retrieve the unmatching blocks again from the network, donā€™t need to fetch the whole file again.

I hope you can get some inspiration of these thoughts.

This is not atomic, as a rename is.

For sure it isnā€™t.

And what about this?

  • Rename the original file to be modified so nobody can modify it while working on it.
  • Make the modification just in top of the renamed file, but previously to make the update, store the original values of the updated blocks in a temporary file (if the process fails these blocks can then be used to leave the file at it was before the update).
  • Once updated, rename the file again to its original name.

That doesnā€™t work, as files are just references to inodes, so while its open, you can even delete it, that just will make it now show up in the file explorer, but it will still exist and you will still be able to acess it. So renaming doesnā€™t change the inode, it just changes how you access it.

Ok, I didnā€™t thought on that. The rename idea is not valid here. Could it be possible to do this?

  • Use a lockf to issue a posix lock on the file
  • Make the modifications (previously extracting the modified parts of the file for a rollback action)
  • Then release the lockf

I forgot to post it here it seems, but I actually wrote up a proposal around how journalling could work here;

Thereā€™s one error case that I donā€™t know how to handle (described there). I wouldnā€™t worry too much about atomicity and locking as this would be something the user could turn on, knowing full well that it means they should not perform modifications to the files by themselves. It could be useful on a remote backup of VM images or something, where no local changes should ever happen.

I have an idea, how about prior to making updates on a given file, we first rename the file to an alternative filename first? For instance we can append tmp behind the filename. Then we apply our patches and rename it back.

Iā€™ve explained above why it doesnā€™t work.

Thereā€™s also the proposal above. It doesnā€™t maybe handle the problem 100%, but anything simpler most likely wonā€™t do at all.