"Pull only" folder type

AudriusButkevicius · November 20, 2016, 8:01pm

But pressing override does the same thing, and we don’t protect people from that not happening (they have an option to protect themselves).

We can be extra cautious, do the right thing and tweak the permissions on the folders/files to prevent people from assuming things are right in the first place.

calmh · November 20, 2016, 8:51pm

Pressing override is a manual action and so not really different from deleting the file manually. I’m fine with that.

If we remove write permissions everywhere this might actually maybe fly… That’s a good idea.

Of course that runs into an interesting problem of what happens when you change the folder type back to normal and we fuck up permissions everywhere else as a result. We’d almost have to remember that this folder was a pull-only folder at one point and do a full restore of permissions before changing the folder type. Urgh.

kluppy · November 20, 2016, 10:48pm

Perhaps the folder should be treated as adding a new share when the type changes.

Would this generate sync conflicts?

Heiko · November 21, 2016, 2:28pm

At some point you have to trust the user. Sure we sometimes do stupid things, but you have to expect users to use their heads at least sometimes. I would not mess with the file permissions, this is more trouble than it’s worth and will cause problems in corporate environments.

That being said, here I think are some safeguards and potential options (I will add both):

Keeping sync conflicts will not delete the file (except if the max number of conflicts is reached)
Versioning will not delete the file and rely on the versioning mechanism to handle things.

I’ll have to figure out how this will work on files/folder that have been added and do not have a global version.

Right now the feature is implemented via an advanced folder configuration (just a boolean flag). I’ve been contemplating if it makes more sense to introduce a new folder type. Behind the scenes it would still be a rwfolder, since so far I only had to modify vector.go and model.go to incorporate this feature. Adding it as a new folder type will of course make the change more complicated. Which approach do you guys prefer? It might be a few days before I get more time to work on it. Hopefully we can agree on the approach by then.

AudriusButkevicius · November 21, 2016, 2:36pm

If anything it should definately be a separate folder type. The parts of code that can be reused should be factored out into reusable components and then potentially embedded into the structs.

kbtombul · November 23, 2016, 3:48pm

I definitely agree that it should be a separate folder type, with a new icon in the GUI. This would visually remind the user if they look at the GUI later.

It might also be a good idea to turn on the versioning automatically when this folder type is selected. The user should be able to disable versioning if they want to but A BIG WARNING can be displayed explaining that they can lose data if they make changes in this folder.

Heiko · November 23, 2016, 4:01pm

Based on everybody’s responses, I’ve actually changed my approach. It’s still in the early stages and needs to be vetted more.

I realized that the downstream side is the wrong place to restrict sending the updates. The upstream device needs to not accept index updates from an untrusted device. I also realized that the overwrite function implements everything you guys were asking for. It already uses the versioning and conflict functionality.

Basically here’s what I have so far: I added new flags to the folder level config. Each device can now be marked to ignore updates from. There’s another flag that will automatically trigger the overwrite feature if a folder update is received from an untrusted device. (This could probably be extended to support rofolders). This will allow “untrusted” devices to receive folder updates and get reset if an unauthorized change is detected. This is important since we’ll be implementing a tree-like structure for syncthing. This approach will allow us to use the rwfolder and not have to play games with rofolders and rwfolders on the same underlying directory.

I hope this makes sense. I wrote this in a hurry between meetings…

Here’s my prototype if anybody wants to look at it: https://github.com/syncthing/syncthing/compare/master...Smiley73:forceoverwrite

calmh · November 23, 2016, 4:14pm

Generally speaking though, the set of connected devices is a graph and you don’t know about connections other than your own. You might be working from something like

but you can also get

unless you convince all cluster members to agree on which the untrusted devices are. Which is venerable old #62…

Potentially when the “originating device” field is in, you could use that to react to. You might still get the situation where a change from the untrusted device is accepted by the trusted device, modified again, and then accepted by yourself.

(It will still take some convincing to make me agree it’s a good idea to have a device out there auto-nuking changes.)

Heiko · November 23, 2016, 5:02pm

When are you planning to add the “originating device” field?

AudriusButkevicius · November 23, 2016, 5:35pm

That can still be easily spoofed, so its not really a security meassure others might assume it is by using this feature.

If anything, this idea needs to stick with its original plan of the device putting itself into sacrifical mode and reverting all changes it finds locally. No form of trust where the change came from with the current design of syncthing is possible, unless we redesign a lot of things and slap alot more crypto on the top.

Heiko · November 26, 2016, 1:08pm

Okay I think I got v1 done for the “writeonly” folder type. It’s called “slave” in the GUI, since the readonly type is called “master”. As requested, versioning and sync conflict handling works. I also pulled out some functions to lib/model/util.go to avoid duplicate code.

I’d like to get some feedback before I update the documentation and send in pull requests.

Here’s the Diff

Once you guys are happy with this feature and it’s merged, I’ll work on the “untrusted peer” feature (without the forceoverwrite). Both of these features together should satisfy most of the needs we have to be able to use Syncthing at our corporation.

AudriusButkevicius · November 26, 2016, 7:07pm

So the way it’s done now, most of the logic lives in the model, which makes that file even worse than it is now, as it keeps ever-growing.

As part of this feature I would have expected a refactor and a shift of logic from the model into the folders.

Also, instead of using a newRWFolder, I would have expected the common things between the two to be refactored into separate components, and then compose two separate folders in two separate files for the purpose (tho they will probably share most of the things they are composed of) .

The things that live in the util file can be all lower case as they are only used in that package and don’t need to be exported, also, you should probably run go fmt against the code.

Also, I am not sure how this interacts with delta indexes @calmh?

calmh · November 26, 2016, 8:37pm

I certainly agree that a refactor of that whole thing (i.e., the model/rofolder/rwfolder responsibilities distribution) is needed, but I don’t necessarily think it’s fair to require that refactor to happen by the next person who touches it. It probably needs to be done by someone who’s quite experienced in this code, as well.

I could argue about details in the actual code but I’m still unsure that this is something we want to do at all… I’ve argued that this is too much of a foot-gun and against the idea of how I want Syncthing to behave. The counter argument so far is that “but people want this” and “you have to trust the user”, none of which I really feel carry the required weight. Some people want to disable crypto or allow unauthenticated devices to access some folders. I’m against those things for the same reasons.

What use case does this solve that is not better solved by proper application of file system permissions or using a uni-directional sync tool like rsync to start with? Start from there. Convince me this is required and useful and that the good outweighs the potential harm. Then I’ll help with the required refactors, implementation and whatnot, and review any pull request.

AudriusButkevicius · November 26, 2016, 9:50pm

I don’t think it’s that much about the use case as much as it is about people wanting unattended master or slave or whatever you want to call it, I suspect.

The only way I can imagine an unattended setup with least damage would be a slave folder.

Other use cases which I would think are valid are:

The use case is sharing your files with someone else. You can’t guarantee they are not muppets and that they won’t drop their stuff into your folder, and if they do, you don’t want to be eyes on glass to see who pooped the party and having to slam the override button.

Or you are a muppet (or the person you are sharing stuff with) and don’t know how permissions work, so you just want the nuclear option.

Also, some applications leave junk around or touches some files then suddenly causing out of sync issues, requiring manual intervention to get things straight.

calmh · November 26, 2016, 10:13pm

I get that people want an unattended mirror-only type device. It could even be a specific slimmed down build of Syncthing that just follows whatever the latest version is and pulls that whenever it changes. I’d install that myself on a couple of machines. But I think that mode should be different:

Periodically scan for missing files,
Never scan existing files for changes,
Never delete files created locally,
Only overwrite local changes when the global/cluster version changes.

So you install it or set it up or whatever, and you get a clean copy of the latest cluster state. Then,

If you delete a file, it’ll be reinstated after the next scan. (Probably. It could be argued that is should stay deleted locally until updated in the cluster, but either way no data loss happens so I’m happy.)
If you change a file, it’ll remain changed until the cluster version changes to something newer, at which point it’ll be overwritten. (Potentially conflict detection kicks in here and creates a conflict version. Maybe mirror folders keep conflict files at some out-of-folder location if we want to ensure cleanliness.)
If you create a file, it’ll stick around and be permanently ignored. If at some point that file is created in the cluster, it’ll be overwritten. (And then see above about conflicts.)

This has the following properties / advantages in my mind:

No chance of changes propagating away from the mirror device,
Folder always in sync with cluster unless deliberately made not so,
Files mistakenly saved into a mirror folder are not immediately destroyed on sight.

I could imagine a “Force full resync” button or something that would clean out unknown files and overwrite changed files. But this would be manual. (Or API driven, for the fanatic.) If you want to ensure that you are always exactly in sync with the cluster and can have no local modifications, use permissions. Syncthing could know and present the current differences between ideal state and local state, somewhere in the vicinity of the “force full resync” button.

In the scenario of sharing some files with a friend and not wanting writes back, neither side cares about that detail and it’s fine. If you’re distributing a website to your ten load balanced web servers and want them to always be exactly in sync, these are all under your control and you can set it up properly to ensure that.

AudriusButkevicius · November 26, 2016, 10:58pm

Well you can do that now, yet we are still having a discussion, and yet “Auto override”, “Slave”, “Super Master” and other stuff comes up every day, so I don’t see how we are solving the “my application is a bastard and keeps modifying my files which I don’t want” case.

AudriusButkevicius · November 26, 2016, 11:03pm

To be honest, the other case that’s probably causing half the trouble is the fact that even mtime changes makes it out of sync and require an override, making people want this really badly.

calmh · November 27, 2016, 11:04am

The difference with my modified “mirror” proposal above would be, similar to @Heiko’s work, that the changes would never make it outside the device that made them. Getting into a fight where we sync a file, some application modifies it, we sync, they modify it etc doesn’t seem constructive. That’s not a problem we should try to solve by playing whackamole, but by opting out of playing whackamole and letting the user solve the underlying issue.

Yep. And we used to be much worse at that than we are, what with syncing back the inevitable mtime changes from filesystems with lower time resolution. Nevertheless, my mirror proposal above handles this by letting the difference slide as it does not matter, and overwriting the file anyway on next change.

I’m all for solving the problem, whatever it is, but I think that trying as quickly as possible to undo and delete changes made by the user is a fundamentally wrong approach. I see basically two different classes of where this would be used:

The other machine is not under your control so you can’t do anything about changes that might happen in the folder, but you don’t want them to propagate. This is the “sharing some photos with a friend” scenario. Set your side to “master”, set the other to “mirror”, and things should be fairly painless. They will get new changes as you make them, you will never know or hear about what they do. Master mode on your side is required to protect yourself against them reconfiguring Syncthing out of “mirror” mode but should otherwise never end up in the “override required” state.
Both machines are under your control because you’re one-way distributing files internally. This looks to be @Heiko’s case. As above, but as things are under your control you can also set up permissions to guarantee that the folder can never be changed locally - all the better. This should work fine already imho, but I can see the appeal in having devices not be able to distribute changed files if they happen for whatever other reason (bit rot, hardware failure, …).

In neither case does it seem required or useful to auto-nuke changes.

AudriusButkevicius · November 27, 2016, 12:19pm

Well people can/should do that now, but they don’t, hence why we are here, so I am reluctant to believe this will help anyone who’s been complaining so far.

It doesn’t address the problem people are bitching about, which is “My folder is master yet it’s out of sync”. You will have the same problem, “My folder is slave, yet it’s out of sync”.

People are not clever enough to do that, we already know that as they keep coming back to this, over and over again even after telling them to use permissions, or sync as a different user, or whatever, they just want it to work with a click of a button, they don’t want to have to do anything from their side.

Heiko · November 27, 2016, 1:31pm

Wow, you had quite a discussion while I’ve been sleeping. Let me add to this by a nice and lengthy post.

Let me try explain better my motivations for the changes I want to introduce. This will hopefully help you better understand the need. Unfortunately, I’m not allowed to tell which company I work for, so I’ll have to dance around the topic a bit.

First of all, you keep talking about users. I completely agree that in the scenarios where it’s just a bunch of users sharing “stuff” on their home computers, you have to be a bit more careful. In my case we don’t talk about that. We have a desperate need to synchronize a lot of files across a lot of locations (~1200). Initiall,y these files are going to be yum repos for local server provisioning and patching, application artifacts, and database dumps. All these files will only get pushed “down stream” and we have to ensure that these files are exactly what they’re supposed to be. Yes we will absolutely have file permissions set correctly, but that’s only one piece of the puzzle. We can’t have to risk that an intrusion into one of the locations or simply somebody doing something stupid, gets replicated to every other location.

Due to the large number of devices, we’ll be setting up a tree structure with 3 levels. The top level will be where we are going to maintain the files. That’ll be at least 2 devices, split over at least 2 data centers, with several folders in RW mode. This way we have some load distribution, but more importantly, have high availability and disaster recovery in case one of the data centers has an outage.

The middle layer will be purely for load distribution. We’ll have at least 4 nodes in each data center, all of them connected to both top level nodes (to be able to handle HA and DR on the top layer). This layer already needs the “slave” folder type.

The bottom layer will be our 1200 locations. Here we will also need the wofolder type.

We have to absolutely ensure that the correct files, and only those files, exist in those folders. If something is not right, then we need to be alerted (events) and it needs to automatically correct itself. We have to deal with SOX and many other audit and security requirements. We can’t have anybody deal with this manually. Even initiating the override via API doesn’t make a lot of sense.

For our prototype, we currently use rofolder and rwfolder types on the same underlying directory on the middle layer. This helps us ensure things only get pushed down stream. It’s a clunky solution, doubles the CPU load, and introduces who knows what in weird behaviors. Unfortunately it also doesn’t allow for the automated recovery and alerting if something isn’t right. This is why I decided to spend my personal time to work with you guys and introduce the couple features I’m missing.

We also have the need to sync files back to the corporate data center. For that, we will at a later point in time, introduce the pushing of files “up stream”. We’ll be using dedicated folders for that. This also has to be all automated, since no humans are involved.

For everything we’re looking for, rsync is out of the question.

I was extremely excited when I came across Syncthing. It provides most of what we need already. We just need to get the tool to a point where it supports corporate environments. We will have users controlling it, who actually know what they’re doing. Put the decision power into their hands.

For the end user, he/she is already getting warned for what it means to put a folder into slave mode. You can’t make everything muppet proof. If a sign says “hot surface - do not touch” and somebody touches it anyway, what can you do?

You got a great tool, it just needs a little bit of tweaking from my view point.