File renames, moves.

tucari · June 30, 2023, 3:33pm

Hi all,

Apologies to bring up and old issue. i.e

The only way we’ve found around this is to stversion all data for 24hrs and re-add that stversions folder as a st folder so that files can be pulled back into the correct place when detected. If we don’t do this then as soon as a user moves a directory of files, the directory tends to get deleted and re-downloaded (it’s 50/50 whether this happens or the folder is moved/copied). This is of course really IO intensive because when st versions a file it then has to be rescanned, and for our use case the folder may be 100GB.

Basically, what I’m wondering, is there any possible for syncthing to be able to pull files back from the stversions folder when it realises that it already had the file (from a tree rename etc) without the extra IO. I.e, when it versions a file it already knows the hashes so doesn’t need to rescan it.

This is similar to how resilio handles renames i believe:

Blockquote There are several scenarios possible but here is the ideal scenario:

You rename file example.txt to example2.txt on your peer (peer A).

On the remote peer B Sync detects that example.txt file is missing on peer A and moves the file into the Archive .

After some time, Sync on peer B detects that example2.txt file appeared on peer A. Sync checks if there is a file with the same hash in the Archive folder. If there is such a file Sync puts it back with a new name.

This flow allows to avoid unnecessary re-transmitting of the data and save bandwidth. Thus, you need to have Archive option enabled, otherwise files will be re-synced again.

I would happily fund any development in this area.

Best, Jon

AudriusButkevicius · June 30, 2023, 5:01pm

I think your versioning solution is the same as archive effectively? Hence not sure what you are after.

mraneri · November 6, 2023, 11:01pm

Also been thinking about a better way to detect renames (and moves). I also have a case where I move large folders of maybe 250GB in total and also rename files in those folders. All within the same ST Folder. In general there is always a lot of redownloading that occurs, and this is time consuming. Oh, and also a ton of extra files in the .stversions folder that have been “deleted” and then redownloaded, instead of being moved/renamed.

I’m also bitten by the issue where windows locks the files as they are being scanned, so I can’t rename a folder after I’ve renamed the files in it until the rescan is done. But that’s a different problem.

To avoid that last one, I pause the folder while I’m doing all the renaming and moving. And in this case all file contents are exactly the same. Then I pause all the devices so no updates get sent to them. Then I re-enable the folder and wait for the 250GB of data to be rescanned. Then I re-enable all of the devices, kind of expecting that all the changes will get lumped together in one big chunk (because the rescan is complete) and the renames will all work.

But it doesn’t work that way at all. In the end the integrity is there, but there are many many additional gigabytes of network transfers, and a ton more block copying instead of file moving and renaming going on. It could all be done so much faster.

Also to be clear, file moves are within subfolders of the same syncthing folder.

Not suggesting it’s easy to make improvements here, but it would be nice…

tomasz86 · November 6, 2023, 11:20pm

What you could try doing is to use versioning while also adding the .stversions folder to Syncthing without sharing it. Alternatively, you could use https://docs.syncthing.net/users/versioning.html#move-to-the-recycle-bin-using-powershell to send deleted (moved) files to the Recycle Bin while adding the Recycle Bin itself to Syncthing as well. However, there is still no guarantee that Syncthing is able to detect, scan and reuse blocks from the versioned files before it starts re-downloading them from other devices. I think some testing would be required to verify the actual behaviour.

Just for the record, this issue has been discussed in https://forum.syncthing.net/t/is-it-possible-for-syncthing-not-to-lock-files-in-windows-when-hashing-them/17086. In short, it’s by design by the Go language team, and the current and only solution is unfortunately to compile Syncthing yourself with a tweak described in the other topic. As the Go team doesn’t want to change the existing behaviour, a proper solution would need to be coded and implemented on the Syncthing side.

AudriusButkevicius · November 7, 2023, 12:08am

How old are the files that are being renamed?

Syncthing changed it’s hashing scheme some versions ago, so if you rename a file that was hashed with the old scheme, the newly discovered file does not look like the old one, hence is not eligible for renames.

You could also try adding logging around this area:

github.com

syncthing/syncthing/blob/c17a1fea77cd1975a63f1275b5b65fd070901081/lib/model/folder.go#L693


      
          			return changes, err
          		}
          
          		if batch.Update(res.File, snap) {
          			changes++
          		}
          
          		switch f.Type {
          		case config.FolderTypeReceiveOnly, config.FolderTypeReceiveEncrypted:
          		default:
          			if nf, ok := f.findRename(snap, res.File, alreadyUsedOrExisting); ok {
          				if batch.Update(nf, snap) {
          					changes++
          				}
          			}
          		}
          	}
          
          	return changes, nil
          }

mraneri · November 7, 2023, 12:40am

Brand new. Only a few days old generally.

AudriusButkevicius · November 7, 2023, 8:09am

If you could craft up a isolated case (where I don’t need to have your files) which reproduces the issue, I might look into it.

mraneri · November 10, 2023, 9:06pm

So I did this… But Syncthing is occasionally deleting the folder marker and the folder stops. How do I prevent this? It would kinda nice if there was a checkbox on the main folder’s versioning options that made this indexing automatic, without having to have a separate folder for it…

The isolated case is rather straightforward.

Create a few thousand 50MB Files with some random names in a subfolder of the root of the shared folder. All files should be different without shared blocks. All should be unique data. I guess in the end, you should create 100-200GB of data to really see this. And yes, my use case is often 200-300GB of data in maybe 10,000 files.
Let everything from the source side sync to a destination folder.
To avoid the excuse of partially scanned files, pause the folder and the device.
Rename all files, create a new subfolder under the root of the same shared folder, and move all the newly renamed files into this new subfolder. The old subfolder where the original files were created is empty.
Unpause the scanner, and let the scanner scan all the changes. It should detect all the deletions and all the additions.
Unpause the device and see what happens. For sure, the other shared device will correctly apply all of the changes and in the end, look like the source folder. But It should really only use a few MB of data to accomplish this (all of the file hashes/metadata.) If you see 10GB of data transferred, then clearly the renames were not all properly detected.

Doubt it matters but I haven’t upgraded yet and am still on 1.25.

mraneri · November 15, 2023, 6:18pm

Anyone have any ideas on this? Why is syncthing deleting the folder marker? Does the stversions clean up process delete all empty folders? This wouldn’t otherwise be unreasonable but it’s a problem in this case.

tomasz86 · November 15, 2023, 6:26pm

This is an old issue. I thought it has already been fixed but apparently no-one has done it so far. The problem is due to the versions cleaning mechanism which deletes empty folders.

What I’m going to suggest below is a major hack that you should not do on any other folders because it can lead to data loss. The only reason that it’s safe in this particular case is because your .stversions folder isn’t shared with any other devices, hence even if something happens and Syncthing considers the folder empty, it won’t cause any file deletions because the folder is strictly local, with no synchronisation involved.

The hack:

Set https://docs.syncthing.net/users/config.html#config-option-folder.markername to . (a single dot). This will make the folder work without .stfolder.

mraneri · December 1, 2023, 5:15pm

So I had a case where I processed 250GB of photos this week. Transfers from the camera, moves from one directory to another directory (within the same folder), file renames, and some more file moves (image sorting).

Normally while I’m doing all the moving and renaming I pause the syncthing folders. Then when I’m done, I pause the devices, and unpause the folders and let syncthing rehash all the files.

This time I decided to leave syncthing alone, and let it do its think while I did my thing (which is obviously a much easier way to work if I don’t have to worry about pausing and resuming.)

Much do my dismay, given a total of 250GB of files, I now have multiple copies with different file names spread across multiple directories in the versions folder. Total versions folder size is 1.5TB. (On average 6 copies of each file.)

So yeah. This can be improved.

Additionally, since I have now created a “folder” for the versions directory to try to save some of the data transfer, I find if I don’t pause the process to let the versions folder scan and calculate hashes, many of the files are transferred anyway.

There are a number of possible solutions. Perhaps none of which are really “easy”. In my mind, two solutions come to mind… Perhaps none are easy to implement. They both have their own pros and cons:

FIRST OPTION (optionally) set a folder to scan completely before sending info to remote folders. Let the hashing process complete and transmit all changes at once. On the receiving side, all changes are received and “processed”. Matches/renames detected, and handled before any other files needing to be requested are requested. In this way, renames result in the new file with hashes being sent in the same batch as the file delete (with the same hashes) and the renaming algorithm has a chance to do a rename as opposed to having the renamed file deleted first (added to versions), and then having to be redownloaded.

This is perhaps the best option… This I would think would be a folder specific option as many folders you may want the transfers to start ASAP, before scanning is complete, as happens today.

SECOND OPTION: Syncthing (optionally) indexed “versions” folders, and when a file is “deleted”, the file is moved to the “versions” folder, but didn’t have to be rescanned to be picked up. I.e. moved in the database also to the versions folder. Syncthing, if needing a file for which a 100% matching file is present in the versions folder, could move the file out of versions folder back into the proper location (and also renaming it).

If this were possible, my files could have been moved out and back into the directory and renamed as the changes occurred.

Anyway, I’m not saying I expect someone to be able to pick up these suggestions and make pull requests to address them. But boy it would be nice for myself and others that rename large files if this could be handled…

tucari · December 5, 2023, 3:36pm

Either of these two options would be amazing!

tucari · February 7, 2024, 1:57pm

Hi All,

Just further to this. If the folders are paused then directory changes are made, then the folder resumed, everything is great. When dir changes/renames are made during the scan, everything in that dir gets deleted and then the changes are discovered on the next scan and resync starts from scratch.

Here’s a video of st running over two VM’s to show this.

Thanks Jon

hireworksltd · February 8, 2024, 10:39am

I’m sure there is a better way to do this, but doing some testing in folder.go, if the deletion scan is only done when theres no scanerrors this works correctly, but I’ve not done enough testing to ensure that I’ve not broken something else:

	// Do a scan of the database for each prefix, to check for deleted and
	// ignored files.

	// Only check if no scan errors
	
        if len(f.scanErrors) == 0 {
            l.Debugf("no scan errors, continuing check for deleted and ignored")
	    changesHere, err = f.scanSubdirsDeletedAndIgnored(subDirs, batch)
	    changes += changesHere
	    if err != nil {
		    return err
	    }

	    if err := batch.Flush(); err != nil {
	    	return err
	    }
        }
	f.ScanCompleted()
	return nil

johnnydecimal · June 16, 2024, 6:04am

Hi there. This thread looks like it died but I just want to give this a +1 and a bump.

I’m doing exactly this, and suffering the results. To summarise my specific scenario:

Three Macs, all running the latest version.
The files in question are new.
I have a folder with about 400GB of video files.
On one Mac, I rename the folder.
On the other Mac, the folder is duplicated and files start to copy.
That Mac runs out of disk space.
Sadness and manual clean-up.

I only just discovered this ‘pausing SyncThing’ trick, which I will try next time. But it’d be nice if this Just Worked.

johnnydecimal · June 16, 2024, 8:19am

Confirming that the following strategy works:

Pause all.
Rename all, i.e. rename the folder on each device.
Unpause. I do one, let it catch up, do another, etc.

mraneri · June 18, 2024, 1:10am

I really think a “don’t send folder updates to remote locations while scanning is in progress” would be a good option in this kind of case. It should absolutely be optional but would be useful on folders with large files that are renamed or moved a lot. Two caveats…

there is perhaps a slightly higher risk of conflicts due to changes known on one device but not sent to another.
if the watcher triggers a rescan while a number of files are being moved or renamed but the moving and renaming hasn’t completed, then the first rescan may not catch all the changes. You could end up in a situation in a busy folder where the rescan is happening almost continuously and in this case you risk substantial delays in communicating the changes to the remotes.

If the watcher triggers a rescan as in #2 above and the watcher continues to detect changes a second rescan could be triggered before communicating changes however there prhaps should be a limit on how many rescans occur before the process gives up waiting and transmits the changes anyway if changes are happening continually.

Just thinking out loud. I don’t know enough about the architecture to know how easy or hard this is, but it’s fairly simple to think about at least conceptually. Implementation may be very different.

tomasz86 · September 15, 2024, 3:55pm

@hireworksltd I’m curious if you’ve been running Syncthing with that code since February. If yes, what is your current stance on it? Has synchronisation been working properly and as expected?

I’m asking because I’ve also been struggling with file rename and move issues, especially since I’m on a data-limited connection most of the time. Normally, when I rename or move a large amount of data, it always seems to result in some re-downloading, even though theoretically the files should just be moved locally on the other side.

hireworksltd · September 17, 2024, 12:23pm

Hi Tomasz.

Actually no. It fixed quite a lot of problems but not all. The solution we went with was a bit hackier sadly, but has entirely fixed the issue:

Added a field LastScanErrors and now check that ScanErrors (for current scan) and LastScanErrors are nil before proceeding with the delete scan. Seems there was some kind of issue with timing as this shouldnt be required, but it seems to work and we’ve many devices out with that patch. I’ll post the code a bit later.

Best, Jon

tomasz86 · September 17, 2024, 1:14pm

Thank you for the explanation! I would be very grateful if you could share the code . You could also consider filing a PR if your current approach has been working so well on a large number of devices. Improvement in this area would be beneficial for everyone who has to deal with large renames and moves.