I’m writing my bachelor-thesis and one part is about Syncthing.
I need to know how Syncthing work, how it’s detecting changes. But I can’t find anything about this topic in the documentation-area… I need something like this ownCloud documentation.
Can anyone help me please?
Here is the protocol documentation: https://github.com/syncthing/specs
Great, thank you so much @lfam!
Additionally, is there some documentation about the scanner? I just found the source code for the scanner itself and some code-documentation. I can work with it, but a ‘real’ documentation would be better.
Sorry, i can attach only one link per post… here the ‘some code-documentation’-link:
No there isn’t any documentation about that as far as I am aware. But it should be clear what it does from the comments in the code. Alternatively, feel free to ask questions.
Yes. I would be willing to edit the answers into documentation.
In summary (approximately…):
ST stores hashes (sha256) and the modificationt timestamps for all files in a local database.
At a request the clients compare their stored hashes and start syncing from the youngest mod-timestamp if some hashes are different.
ST checks the directories all x seconds (default 60) for local changes. If a file has an other hash, the new hash and mod-timestamp will saved in the local database.
Is that right?
Does that mean that all files have their own (not changeable) IDs to detect changes in the hashed values? Or if a file is changed ST delete it’s db-entry and create a new one?
It stores more than just modification times… permissions, flags (symlink, etc), size, versions (vector clocks - an ever incrementing integer for each device).
On connecting, clients exchange indexes, decide what’s newer based on the versions, decide on who has the file based on the indexes it has received from others, and starts asking nodes for the file.
The requests contain folder name, file path within the folder, offset, how much to read, expected hash to get for that block. Files in the database are identified by folder name + path, so a deleted file is actually marked as deleted (and kept in the index) and a new one is created.
Obviously there are various optimizations in place:
- When we receive an index update from someone, we try identifying delete + add as a rename.
- When downloading something, we know the hash of the block we expect to get, as well as we know what hashes we have locally, allowing us to reuse the block from some existing local file which has that part of the content… instead of downloading it from someone.
- We walk the directory tree every N seconds, checking the existence/non-existence of the object in on the disk/index, if it exists in both places we then compare the object type (file being replaced with a dir?), mtime, and size. If any of the don’t match, we send the file for hashing (given it’s a file, if it’s a directory we just record that it’s now a directory). Once we’ve finished the walk, we save the changes to our local index, and dispatch them to all interested parties.
Currently all devices exchange the full index on connecting, but there is PR in progress which only exchanges parts which haven’t been seen (decided max version of any given object as far as I recall).