Well syncthing clearly hasn’t scanned/found a good chunk of your files for some reason, hence is redownloading them. I suggest you shut down, remove the remote device, rescan, and see if local state starts matching global state. If it’s not, we should look into why, as that is the cause of the redownloads.
We transferred around 6.5 TB via rsync (I don’t know the 100% exact numbers)
We hoped that Syncthing would pull the remaining 3 TB in the order (newest to oldest)
What we are seeing is:
Syncthing does as expected for about 15 to 30 minutes (pull the latest files, partly re-using existing blocks etc.) after restarting Syncthing
After the initial phase Syncthing starts pulling files that are already present on master and client and are identical
On the client Syncthing never reports the correct global state
Possible explanation that I can come up with:
The hashes don’t match on older files, therefore Syncthing re-downloads them
Both master and client have enabled large blocks however, master from pre-v1 and client from post-v1. Maybe large blocks on the master were enabled at a later stage (can’t remember since this was a couple of months ago).
Solutions?
Remove folders from both ends, wipe the indexes
Re-add folders, let them both re-hash/re-scan using large blocks
Re-connect the two
Any other explanations/solutions you can think of?
So given your suggestion about large blocks this all starts to make sense. The files are identical but the blocks are not, as the files were scanned with different block sizes, hence syncthing redownloads.
You need to enable large blocks on both sides, nuke the database (potentially be removing and readdong folder, but if you have a single folder a full wipe might be easier), let them rescan and then share the folders with each other.
Since it is not possible to create a folder straight away with large blocks what is the best way forward to make sure that all files are scanned with large blocks?
Shouldn’t it re-scan the second you enable this feature???
Anyway, how can I verify that this is the case? Is there a way to query the database about a certain file/path and retrieve the stored blocks/hashes?
It could, but it generally isn’t necessary. The reason you run into this is that it’s the initial scan on existing data on more than one device, which means every file is in conflict to begin with. Generally that’s only a metadata conflict which gets resolved invisibly without downloading anything. Having different large blocks settings makes it a data conflict instead.
You can however just set the option (soon) after creating the folder. It’ll restart with the new setting without having had time to do much.
this file is too small to be affected by any large block setting. “Large blocks” only kick in when the file is larger than 256 MiB. So while this may be part of it, it’s probably not 100% of whatever is going on in your setup.
Or you can setup folders, enable large blocks and then nuke the database, which will force things to start fresh.
Never the less, we could do a better job identifying that the files are the same just with different block sizes, which I think is github issue worthy.
OK, but before I open an issue is there a way to verify that this is actually happening? Some way to retrieve the blocks and hashes for a specific file/path?
Hashes no, but the endpoint I pointed to at least shows the number of blocks in the file. That shows the size in use. If it’s not 128 KiB blocks it’s large blocks. If it differs between the local and global version, that indicates a mismatch and download required.
After deleting the index on both sides, re-scanning, re-hashing both sides report a correct global state. Also downloading looks fine so far. I’m still a bit weary but hopeful.
Looks like rebuilding the indexes was the solution. It is a joy to watch it progressing (downloading only 90GB and re-using 710GB of existing data to make it worth 800GB). Thank you guys for this wonderful piece of FOSS and your ongoing support!