Infrastructure Syncing

Ben_Curthoys · March 5, 2015, 2:12pm

I mostly want a Sync solution for moving files to and from servers.

I have 1 folder per client database, syncing backups and transaction logs both to development machines when needed for debugging and to a separate host in a different data centre for DR. I want separate folders (and not one folder with synced subfolders) because I want to be able to share a database with the client who owns it, so that they have their own warm copy of their database to do with as they will. These are all RW on the database servers and RO everywhere else.

I have a Builds folder, which is RW on my development machine (no dedicated CI server yet) and RO on each of the webservers which I use to deploy. I can run my build script which drops the binaries in the synced build folder, and the binaries are on all the web servers minutes later ready for me to run the deployment script…

And I have utility folders for transferring large files and useful scripts up and down and between webservers, databaseservers, cacheservers, etc. So that’s another 3 folders.

None of these folders are “owned” by an individual or an account; databases are shared between servers and clients, builds are shared between developers and servers, utility files are shared between everyone. Having some slight pain setting up the links is ok for me - as long as they require no maintenance thereafter. What I like about SyncThing and used to like about BtSync and don’t like about Dropbox for this use case is the ability to sync just based on keys and secrets, and not have to link everything together in a central account.

Things that would be nice for me:

Searching / sorting folders
Resetting a read only folders; I want to throw away local modifications and just get the latest database from the server, thanks.
Pausing sync on some folders in order to get my new bug fix build onto the servers faster
Pausing sync on all folders because I’ve taken my laptop out of the office and I don’t want to burn my monthly bandwidth allowance on 2gb of database backups I don’t actually need.
Run as windows service out of the box (hardly critical)

AudriusButkevicius · March 5, 2015, 2:30pm

These are just my ideas towards the problems (however shallow and stupid they might sound):

Browsers support Ctrl+F, and I do think folders are sorted alphabetically already.
This is hard when there are more than 1 node involved. How will you know which one to revert back to? Also, I am not sure how you’d do this in terms of implementation, since everyone tracks everyone elses view of the data, and then the highest version for each file wins. If you have file X on node A with version 10, and node B with version 11, you couldn’t really revert to version 10, as it would get overwritten by B’s version 11, hence you’d have to force B to revert too. You can however achieve this on A, by suddenly bumping the version to 12 (even though the content is the same), but this is already implemented (Override changes). You can alternatively, in a scenario of 2 nodes, just zap the index on the device you want to revert.
You can temporarily unshare it (might require a restart though). This is easy to implement, but hard to make sure someone doesn’t shoot themselves in the foot. (Search on github for the actual problem definition)
Just shutdown syncthing
Check the Wiki, there are guides how to do that. Syncthing being cross-platform, it’s hard to please everyone.

Ben_Curthoys · March 5, 2015, 2:57pm

For context - I wrote out my use cases to explain to someone why the direction that BtSync was going in didn’t suit me, and then I thought it might be of use if I shared them here.

(1) Was one of my main irritations with BtSync - I think the UI used HTML for rendering, but I couldn’t ctrl+F, and the folders were sorted in the order they were added. Alphabetical sorting by FolderId is fine. (2) What I would like (if practical) would be to “revert to version held on Folder Master”. (3) That’s true. (4) Yeah. But if it turns out I do want 1 file, when I start syncthing, it’s going to start syncing everything. If I could pause everything, then I could just unpause 1 folder if I wanted it (5) Yes, I’d read the guides. Just saying it’d be nice =). If I get the time I might try and make an installer. I won’t get the time.

AudriusButkevicius · March 5, 2015, 3:12pm

Sure, that’s why the User stories section was for. I just thought I could explain some of the reasons behind things, as well as propose sad workarounds

For 2, folder master is not a “I am master” everyone else is not king of thing. It’s down to the individual node, hence you might have multiple masters for single folder. All that means is that the device refuses any updates from others.

Ben_Curthoys · March 5, 2015, 5:07pm

This is getting a bit off topic, but with the Folder Master thing… If you have a cluster where there are 4 non-master nodes and 1 master node, I can see that working. Changes to the master node are synced to the others, and changes to the non-master nodes don’t affect the master - though they do get synced to the others, which I’m not sure I’m keen on.

But if you have a cluster with 3 non-master nodes and 2 master nodes, then the two master nodes are ignoring each other, and can get out of sync. Suppose master A modifies a file, and later master B (which has ignored master A’s change) also modifies that file. The child nodes are then going to have a conflict if they overwrite A’s version with B’s. .

If that’s how it’s designed to work - I don’t think I’m going to find “Master” very useful =)

I would prefer if “Master” equates to “RW access” and non-master “RO access”, so that 2 master nodes talk to each other, i.e. that master nodes sync between themselves, and ignore changes only from non-master nodes. That way, the multiple masters will all have the same data, and a “reset button” for a non-master node would make sense again. I understand if enforcing this securely is hard. Personally, I’d only want it as a convenience - I’m never going to make a build on a webserver, or a database transaction log on a development machine - but I also appreciate that having something that looks like a security feature & isn’t secure is a v. bad idea.

Perhaps an answer would be, next to and exclusive with or possibly even instead of the “Master” option, a “Slave” option, which means "I’m subscribing to this folder read only. I don’t want to send my changes (if I make any by accident) but I want to receive all of everyone else’s changes.". Instead of making 1 folder the master, you make 4 folders slaves. They don’t send changes, so the ex-master node never receives changes. But two non-slave nodes don’t ignore each other then.

In direct answer to “This is hard when there are more than 1 node involved. How will you know which one to revert back to?”: If I’m read only, then my version doesn’t count, and everyone else’s version will be (eventually) the same. So I revert back to whatever the latest version any of the other nodes have.

AudriusButkevicius · March 5, 2015, 5:25pm

Well that’s now how it works now, and two masters means constant inconsistency, where each master can choose to override others changes. It is RW and RO, but on a local level rather than cluster level.

From a security POV, people mark a node as non-master because they are afraid it might get compromised and cause damage. In the scenario which you envision, what’s preventing a non-master node from checking the checkbox and becoming a master and then causing havoc?

Actually this does make sense, is not rocket science to implement and solves the problem nicely BUT. But I guess my personal preference would be to have the control in my own hands, hence I mark which nodes I allow to modify view of the global state, and which nodes I only allow to read, this way if someone is compromised, it doesn’t mean he can nuke my data.

This as was correctly pointed out to me by @calmh today requires a consesus (such that if I trust A but not B, we need to make sure that A doesn’t trust B either).

Ben_Curthoys · March 5, 2015, 5:44pm

I think this comes back to the subject line of this whole use case: “Infrastructure Syncing”.

I’m thinking about a case where I have lots of nodes, and they’re all on servers I control. If someone has pwned one of my read only nodes, then e.g. they’ve already got at least one webserver. Being able to turn off read-only and write to my builds folder and sync a modified build to the other webservers would make matters marginally worse, it’s true, but that’s going from 98 to 99 on the “how bad things are out of 100” scale, so I’m not that bothered.

If the different nodes were under the control of different people, I would absolutely want to enforce their read-only-ness. But if non-secure-read-only is easy and secure-read-only is hard, then I’m just saying that I personally would find non-secure-read-only very useful by itself.

I haven’t quite got my head around what introducer nodes do or how they work and need to go away and read the documentation, but I’ve a hunch they might help with the transitive trust problem.

generalmanager · March 7, 2015, 8:01pm

I just added my proposal on how to safely introduce read only devices in a user friendly way: https://github.com/syncthing/syncthing/issues/62#issuecomment-77707022. This should also work in your setup.

StrikerTwo · March 10, 2015, 8:51am

And I just created https://github.com/syncthing/syncthing/issues/1439 for a simple solution.