File renames, moves.

Hi all,

Apologies to bring up and old issue. i.e

The only way we’ve found around this is to stversion all data for 24hrs and re-add that stversions folder as a st folder so that files can be pulled back into the correct place when detected. If we don’t do this then as soon as a user moves a directory of files, the directory tends to get deleted and re-downloaded (it’s 50/50 whether this happens or the folder is moved/copied). This is of course really IO intensive because when st versions a file it then has to be rescanned, and for our use case the folder may be 100GB.

Basically, what I’m wondering, is there any possible for syncthing to be able to pull files back from the stversions folder when it realises that it already had the file (from a tree rename etc) without the extra IO. I.e, when it versions a file it already knows the hashes so doesn’t need to rescan it.

This is similar to how resilio handles renames i believe:

Blockquote There are several scenarios possible but here is the ideal scenario:

  1. You rename file example.txt to example2.txt on your peer (peer A).
  2. On the remote peer B Sync detects that example.txt file is missing on peer A and moves the file into the Archive .
  3. After some time, Sync on peer B detects that example2.txt file appeared on peer A. Sync checks if there is a file with the same hash in the Archive folder. If there is such a file Sync puts it back with a new name.

This flow allows to avoid unnecessary re-transmitting of the data and save bandwidth. Thus, you need to have Archive option enabled, otherwise files will be re-synced again.

I would happily fund any development in this area.

Best, Jon

I think your versioning solution is the same as archive effectively? Hence not sure what you are after.

Also been thinking about a better way to detect renames (and moves). I also have a case where I move large folders of maybe 250GB in total and also rename files in those folders. All within the same ST Folder. In general there is always a lot of redownloading that occurs, and this is time consuming. Oh, and also a ton of extra files in the .stversions folder that have been “deleted” and then redownloaded, instead of being moved/renamed.

I’m also bitten by the issue where windows locks the files as they are being scanned, so I can’t rename a folder after I’ve renamed the files in it until the rescan is done. But that’s a different problem.

To avoid that last one, I pause the folder while I’m doing all the renaming and moving. And in this case all file contents are exactly the same. Then I pause all the devices so no updates get sent to them. Then I re-enable the folder and wait for the 250GB of data to be rescanned. Then I re-enable all of the devices, kind of expecting that all the changes will get lumped together in one big chunk (because the rescan is complete) and the renames will all work.

But it doesn’t work that way at all. In the end the integrity is there, but there are many many additional gigabytes of network transfers, and a ton more block copying instead of file moving and renaming going on. It could all be done so much faster.

Also to be clear, file moves are within subfolders of the same syncthing folder.

Not suggesting it’s easy to make improvements here, but it would be nice…

What you could try doing is to use versioning while also adding the .stversions folder to Syncthing without sharing it. Alternatively, you could use https://docs.syncthing.net/users/versioning.html#move-to-the-recycle-bin-using-powershell to send deleted (moved) files to the Recycle Bin while adding the Recycle Bin itself to Syncthing as well. However, there is still no guarantee that Syncthing is able to detect, scan and reuse blocks from the versioned files before it starts re-downloading them from other devices. I think some testing would be required to verify the actual behaviour.

Just for the record, this issue has been discussed in https://forum.syncthing.net/t/is-it-possible-for-syncthing-not-to-lock-files-in-windows-when-hashing-them/17086. In short, it’s by design by the Go language team, and the current and only solution is unfortunately to compile Syncthing yourself with a tweak described in the other topic. As the Go team doesn’t want to change the existing behaviour, a proper solution would need to be coded and implemented on the Syncthing side.

How old are the files that are being renamed?

Syncthing changed it’s hashing scheme some versions ago, so if you rename a file that was hashed with the old scheme, the newly discovered file does not look like the old one, hence is not eligible for renames.

You could also try adding logging around this area:

Brand new. Only a few days old generally.

If you could craft up a isolated case (where I don’t need to have your files) which reproduces the issue, I might look into it.

So I did this… But Syncthing is occasionally deleting the folder marker and the folder stops. How do I prevent this? It would kinda nice if there was a checkbox on the main folder’s versioning options that made this indexing automatic, without having to have a separate folder for it…

The isolated case is rather straightforward.

  1. Create a few thousand 50MB Files with some random names in a subfolder of the root of the shared folder. All files should be different without shared blocks. All should be unique data. I guess in the end, you should create 100-200GB of data to really see this. And yes, my use case is often 200-300GB of data in maybe 10,000 files.
  2. Let everything from the source side sync to a destination folder.
  3. To avoid the excuse of partially scanned files, pause the folder and the device.
  4. Rename all files, create a new subfolder under the root of the same shared folder, and move all the newly renamed files into this new subfolder. The old subfolder where the original files were created is empty.
  5. Unpause the scanner, and let the scanner scan all the changes. It should detect all the deletions and all the additions.
  6. Unpause the device and see what happens. For sure, the other shared device will correctly apply all of the changes and in the end, look like the source folder. But It should really only use a few MB of data to accomplish this (all of the file hashes/metadata.) If you see 10GB of data transferred, then clearly the renames were not all properly detected.

Doubt it matters but I haven’t upgraded yet and am still on 1.25.

Anyone have any ideas on this? Why is syncthing deleting the folder marker? Does the stversions clean up process delete all empty folders? This wouldn’t otherwise be unreasonable but it’s a problem in this case.

This is an old issue. I thought it has already been fixed but apparently no-one has done it so far. The problem is due to the versions cleaning mechanism which deletes empty folders.

What I’m going to suggest below is a major hack that you should not do on any other folders because it can lead to data loss. The only reason that it’s safe in this particular case is because your .stversions folder isn’t shared with any other devices, hence even if something happens and Syncthing considers the folder empty, it won’t cause any file deletions because the folder is strictly local, with no synchronisation involved.

The hack:

Set https://docs.syncthing.net/users/config.html#config-option-folder.markername to . (a single dot). This will make the folder work without .stfolder.

So I had a case where I processed 250GB of photos this week. Transfers from the camera, moves from one directory to another directory (within the same folder), file renames, and some more file moves (image sorting).

Normally while I’m doing all the moving and renaming I pause the syncthing folders. Then when I’m done, I pause the devices, and unpause the folders and let syncthing rehash all the files.

This time I decided to leave syncthing alone, and let it do its think while I did my thing (which is obviously a much easier way to work if I don’t have to worry about pausing and resuming.)

Much do my dismay, given a total of 250GB of files, I now have multiple copies with different file names spread across multiple directories in the versions folder. Total versions folder size is 1.5TB. (On average 6 copies of each file.)

So yeah. This can be improved.

Additionally, since I have now created a “folder” for the versions directory to try to save some of the data transfer, I find if I don’t pause the process to let the versions folder scan and calculate hashes, many of the files are transferred anyway.

There are a number of possible solutions. Perhaps none of which are really “easy”. In my mind, two solutions come to mind… Perhaps none are easy to implement. They both have their own pros and cons:

FIRST OPTION (optionally) set a folder to scan completely before sending info to remote folders. Let the hashing process complete and transmit all changes at once. On the receiving side, all changes are received and “processed”. Matches/renames detected, and handled before any other files needing to be requested are requested. In this way, renames result in the new file with hashes being sent in the same batch as the file delete (with the same hashes) and the renaming algorithm has a chance to do a rename as opposed to having the renamed file deleted first (added to versions), and then having to be redownloaded.

This is perhaps the best option… This I would think would be a folder specific option as many folders you may want the transfers to start ASAP, before scanning is complete, as happens today.

SECOND OPTION: Syncthing (optionally) indexed “versions” folders, and when a file is “deleted”, the file is moved to the “versions” folder, but didn’t have to be rescanned to be picked up. I.e. moved in the database also to the versions folder. Syncthing, if needing a file for which a 100% matching file is present in the versions folder, could move the file out of versions folder back into the proper location (and also renaming it).

If this were possible, my files could have been moved out and back into the directory and renamed as the changes occurred.

Anyway, I’m not saying I expect someone to be able to pick up these suggestions and make pull requests to address them. But boy it would be nice for myself and others that rename large files if this could be handled…

Either of these two options would be amazing!

Hi All,

Just further to this. If the folders are paused then directory changes are made, then the folder resumed, everything is great. When dir changes/renames are made during the scan, everything in that dir gets deleted and then the changes are discovered on the next scan and resync starts from scratch.

Here’s a video of st running over two VM’s to show this.

Thanks Jon

I’m sure there is a better way to do this, but doing some testing in folder.go, if the deletion scan is only done when theres no scanerrors this works correctly, but I’ve not done enough testing to ensure that I’ve not broken something else:

	// Do a scan of the database for each prefix, to check for deleted and
	// ignored files.

	// Only check if no scan errors
	
        if len(f.scanErrors) == 0 {
            l.Debugf("no scan errors, continuing check for deleted and ignored")
	    changesHere, err = f.scanSubdirsDeletedAndIgnored(subDirs, batch)
	    changes += changesHere
	    if err != nil {
		    return err
	    }

	    if err := batch.Flush(); err != nil {
	    	return err
	    }
        }
	f.ScanCompleted()
	return nil



1 Like