I think I have an idea of why this occurs. I will investigate and get back to you and Jakob. But I have found several ways it can happen, and the issues relate to protocol messaging, version detection, filesystem model serialization, file cacheing, and prefetching, to name a few.
Most of these problems are not specific to Syncthing – they are generalized computer science problems.
The main question is this… Even if we do perfect everything, there’s going to be conflicts across the network. The key question revolves around “Conflict Detection and Resolution” – how to we detect and handle problems ?
(1) Local Master, Remote Copier : Remote runs into a sudden problem, like it’s NFS save directory is temporarily offline. An error occurs (a file cannot be copied or deleted, or doesn’t exist)… Maybe syncthing copies the file again to the now-empty mount point. Syncthing prints the error but keeps running. When disconnected mount-point returns online we don’t know what is copied fully andwhat is not.
The Local Master can now go out of sync for a number of reasons – mainly what happens is it thinks a file is not copied when it actually was copied. Or if the disk goes offline.
For example: Consider some file blocks written to the disk (iin the eyes of a SyncThing slave), but ffile writes can take up to 1 minute to actually be written to the platter. This can be a problem when dealing with disk cache. So anyway, say the OS fails to flush the kernel cache. The blocks dont update on the disk, just in the cache.
SyncThing then sends a ‘success’ message saying it updated the blocks it received.
struct IndexMessage {
string Folder<>;
FileInfo Files<>;
}
struct FileInfo {
string Name<>;
unsigned int Flags;
hyper Modified;
unsigned hyper Version;
unsigned hyper LocalVer;
BlockInfo Blocks<>;
}
Now we are ‘out of sync’, at least temporarily. Syncthing may or may not resolve this example on it’s own… But it gives you an idea of the problems.
In this case the solution is not to ignore it. I think the solution is to update the protocol to modify the way errors are handled. For example, if during writing on the Remote side, if there’s an error (disk space full), then that copy of the program should show that.
The user should be able to click the ‘Out of Sync’ message and see what file needs to be repaired and select how to proceed.