Showing a list of duplicate files.

First Post here. Recently started using Syncthing on a TrueNAS (which has Syncthing as one of only eight default plugins) and some MacOS computers, and it works like a charm. Speeding things up for our designers, as they can now work at full-speed having their files on a local HD, instead of on a Network Share (but still having a synced copy on the NAS).

I was wondering if the internals of Syncthing works with some kind of Hash of each file being synced, and if it does, wouldn’t it be great to use that information to show a list of duplicate files existing in the Synced folders? (Got 320k files in 27k folders, I’m pretty sure we have duplicates).

Hello and welcome,

my understanding is that syncthing compares the data on a block level, so the information you’re after might not be as trivial to get as you might think.

Also, while the underlying methods may be similar, this feature certainly does not qualify as “continuous file synchronization”, so the chances of this feature making it into the syncthing project are slim to none. Think of it this way: Syncthing is a complex project as it is. Adding remotely related functionality to it would quickly make the project unmaintainable.

But yeah, I too have folders where I would be interested in the amount of duplicated data/files and possibly use syncthing’s data to facilitate deduplication (and my understanding is that it actually can do continuous/online deduplication under certain conditions).

1 Like

Indeed, though we have the info in the database and could expose this information so that an external tool/script could present it. It’s almost there, but not quite. You can get a file listing for a folder using /rest/db/browse then for each file you can inspect it with /rest/db/file, however currently this doesn’t help because we don’t expose the block information. We could easily do this, because there is a field called BlocksHash which is a hash of the block hashes – it uniquely identifies the contents of the file1. I’ll add a change to expose it, it doesn’t cost us anything.

1) There’s a corner case in that two files might have the same content but different block sizes, which will look to Syncthing like different contents. It’s not the common case, though.