Feature request: "Lazy hashing" option


(tsftd) #1

I use Syncthing to synchronize downloaded files across several computers. Some of those downloads are via http, some via ftp, and some via bittorrent. Unfortunately, this causes issues with Syncthing, as it starts syncing files as soon as they start downloading, then keeps re-syncing them as they change over time, until they are finished.

This results in:

  1. constant re-hashing as the files download
  2. bandwidth being used to upload the files while they are being downloaded
  3. disk access time being taken up due to #1 and #2

Now, it’s certainly not a critical bug, but when I’m spiking activity, it can easily end up in the downloading computer thrashing as it’s simultaneously downloading dozens of files, rehashing them constantly, and uploading them constantly – as well as the other computers constantly getting a “file has changed” error as they’re trying to sync said files.

A workaround, which is the feature that I am proposing, would be a “lazy hashing” option. Essentially, if you enable “lazy hashing”, you can enter a time period (5 seconds, 30 seconds, 5 minutes, whatever). Syncthing will then skip hashing a file that has been modified more recently than that. Basically, it will skip (and ideally queue) any file that has been added/changed, until it hasn’t been modified for X timeperiod.

This avoids breaking the normal flow of how Syncthing works, while accounting for files that are constantly changing (due to downloading, streaming a filecapture to disk, etc). Anyone who doesn’t want to use it, just doesn’t enable the “lazy hashing” feature, and it’s no skin off their backs.


(Jakob Borg) #2

This sounds like a syncthing-inotify thing. Potentially it could learn that when a file is continuously changing it should start slowing down the rescans.


#3

You can already solve your problem with existing features.

  • Ignores:
    Most download managers can add an additional file extension to the in progress files. If you ignore files with that extension, Syncthing will scan it only after the download is finished. Chrome uses such an additional file extension for downloads as well.

  • REST API:
    You can set the scan interval to 0 and have you download manager (or manually) send Syncthing the command to scan that specific file after the download is complete.


(tsftd) #4

That would certainly solve the issue re: people using inotify, though not for people using scan interval.


(tsftd) #5

Ignores: Some can, yes, however at least in my case I don’t believe that the torrent client that I am using can. And I’m unaware of any pure FTP clients (not download managers with FTP built in) that do that.

REST API: This wouldn’t address people using inotify, requires programs that can send syncthing commands on download completion, and would preclude other files (made by other programs) from being synced, or necessitate splitting downloaded files to one folder and non-downloaded files to another.


(uok) #6

what programs are you using?


(Jakob Borg) #7

Curiously enough Syncthing actually had this feature way back in version 0.2 or thereabouts. I remember it caused confusion and frustration when it didn’t rescan files that had changed and getting the threshold anywhere near right was difficult. I’m not super fond of going back to it.


(Audrius Butkevicius) #8

Stop syncthing when you are in your peak activity.


(John Veness) #9

I don’t know what torrent client you’re using, but some of them (e.g. deluge) have a feature where it will move completed downloads to a different folder. So you could have your completed downloads folder synced with syncthing, but not the incomplete downloads folder.


(tsftd) #10

Makes sense that if you did have it in the past and it caused problems, it isn’t a great idea.


(tsftd) #11

Yes, that’s what I’ll end up doing for the torrents. Unfortunately, rTorrent is a pain in the bum to do it with (you have to set it individually for each watchdir).


(tsftd) #12

Unfortunately, a lot of the activity is automated – flexget parses rss feeds for what I want, then downloads the .torrent into an rTorrent watchdir, automatically starting it. I guess maybe I could use a daemon to pause syncthing when rTorrent’s CPU usage spikes, but that would be a whole lot of work, as I’d have to learn how to do all of it.


(Minooch) #13

I’ll second downloading to a temp folder and moving once completed. Seems way more logical than adding useless features


(Nas) #14

I see the ‘arguments against’ make sense, but I do also experience the same issue as the OP. Especially because in my network there are several NAS’ses, that really keep quite busy with the syncing due to this. It’s a network with 12 people, files change all the time and are of all kinds, so temp switching off or relying on excluding file extensions won’t work.

Maybe somebody will come up with an alternative approach; for me it certainly won’t be a useless feature. It’s actually the only inconvenience I still have with Syncthing; other than that, I’m extremely happy with it.


(Jakob Borg) #15

Increase the scan interval, for roughly the same effect?