How should ST behave in the given example?

kahun · January 4, 2016, 7:46pm

Hi,

I have a sync scenario and I am wondering what the natural behaviour of synching (before I try it) should be.

There are three nodes. A, B and C. All have been syncing fine for a while. Now one of the nodes © has database corruptions which means that the user has to delete the database and restart it.

A<–>B works

C=> taken down offline due to database corruption(until the user figures out what to do with it)

The user deletes some files from A so the changes are reflected on B but not on C, as we know C is offline atm.

So when C is back online with a fresh new database, what happens to the deleted files from A?

Are they going to be deleted from C?
Are they going to be synched back to A and B?

My personal preferecence is that the files should be deleted from C, even though C is getting a fresh new database. First 2 nodes already know that those files are deleted.

thanks

calmh · January 4, 2016, 7:52pm

Further thought experiment - does the answer to this question change if

C saw the deletes and handled them,
then went offline and lost its database,
then had the files created again by the user, with the same contents,
then came back online again?

kahun · January 4, 2016, 7:54pm

Sure but we have 2 nodes saying that they deleted the files. So 2 against one there.

I guess my question is for that particular situation is this, what is the probability of creating the exact files ? Unless the users brings them from back up. If that is the case the user can still bring the those files back after A and B deletes the files from C after it gets its new database.

Btw the reason I am bringing this up because I happen to get this alot especially because of my Android devices (with arm builds or the Android app) seem to be needing to get new databases. I have about 5 Android devices so I doubt that all are having similar hardware problems.

AudriusButkevicius · January 4, 2016, 9:07pm

Technically its a conflict, as they disagree on what the correct state is, and in case of a conflict the side which does not risk data loss wins.

If you think with your 2vs1 logic, then you’d loose data in calmh’s scenario, and you don’t want that.

kahun · January 4, 2016, 9:22pm

Well I do not agree that they should be in conflict. The reason is this. If C was online the files were going to be deleted from C anyways. Becauase A and B already agreed on this process and C is part of the circle.

Even lets say this is going to e some sort of conflict, ST is not going to be copying them as conflict files to A and B, they will be shown as regular files on A and B. In that case the users on A and B will go ahead and delete them again, assuming that they know what is going on and they are aware that C was offline and such, in reality they might never know this is happening.

In the end because A and B already want the files deleted, the files will be deleted from C after C recreated them.

This just creates more work for the users.

kahun · January 4, 2016, 9:31pm

I think that the logical thing could be something like this if possible.

If C tells A and B that its first talk with them, and If A or B sees that C has a file (with the same hash/date etc) that they already deleted, the file should be deleted from C (given that C tells them that its first initiation with them using the fresh new database, some kind of first contact).

kahun · January 4, 2016, 9:37pm

One other thing is that this can get very complicated. Here is another similar situation.

C goes offline as described in the first post. A and B did many changes to the repositories. Lets say the repo is like 10gb, they made changes that affect around 2gb. Now C creates a new database and goes back online. But this time C also wants a exact copy without regetting all the files because as we know it is a big repo. And A and B just messed (edit, delete, add etc) around 2 gbs of files. Still very complicated for all sides to agree and do manual clean up.

If the synching logic works as I mentioned, C will only sync 2 gb of data. If it is as described by you then C has to sync 10gb of data after deleting all the files, deleting the database etc.

More work for the user.

canton7 · January 4, 2016, 9:40pm

Syncthing won’t download data it already has locally. It will not download 10GB in this case.

kahun · January 4, 2016, 9:42pm

Just to make sure you know, C does not have a database, it will be a fresh database because its database was corrupted.

C wants an exact copy and C does not want to manually match A and B. Otherwise the sync will be simple merge meaning that the files on C that are not on A/B will be transferred to A and B which is not what C, A and B wants because then A and B have to redo all the file changes/deletes they did.

Overall the main issue is the deletes not updates or conflicts. We are not agreeing on how the deletes should be handled in given conditions.

canton7 · January 4, 2016, 9:56pm

I’m afraid I’m having a lot of trouble understanding that last post. What does “manually matching” mean? What is a “simple merge”? Why do you want to synchronize devices, but not have files from one of the devices transferred to the other devices? Why will A and B have to redo changes? What does “the deletes not updates or conflicts” mean?

kahun · January 4, 2016, 10:05pm

@canton7

Did the first post make sense as far as being a plausible scenario? If so the post you were referring is actually along those lines.

A and B did alot of changes to the share. They modified hundreds of files and deleted hundreds of files. Meanwhile C was offline. Then C comes back online after deleting the corrupt database and starting a new one.

Now C wants an exact copy without deleting all the files. If C goes the way you described, C just starts ST and lets it sync. At the end of the sync A and B will have all the deleted files back in their repo (which they will hate to have them back) and maybe some conflicts. C on the other hand will get all the updated files except the deletes because C already send them to A and B as new files (even though A and B deleted them).

In the end of this process noone will be happy with what they end up having. A and B have to redelete all the files that they deleted in the first place. The C has to wait for them to finish this process. The C will sync those deletes. Given that we talk about hundreds of thousands of deletes/changes, this will be a complicated manual process after the intiial sync.

In my personal view, what has to happen is that the deletes from A and B should have been reflected on C too. But @calmh and @AudriusButkevicius do not agree because they think that is just a conflict.

canton7 · January 4, 2016, 10:07pm

If you deleted the database from C, this cannot happen. There is no way of distinguishing this case from a new device connecting to the network: A and B can’t tell that C existed previously, because you deleted the database.

(Is my understanding, anyway. @calmh and @AudriusButkevicius, the two main syncthing developers, know better than I do).

kahun · January 4, 2016, 10:08pm

Yes, but A and B already know the deleted files (I assume they know the hashes) and they will see that C will try to send those same files back (not some updated or the changed versions) to them. Why would they let those files in if they already deleted them? And we are not talking about a threesome first time sync here. There is already some kind of established sync.

Also C did not delete the database just for fun. There was a database corruption.

canton7 · January 4, 2016, 10:20pm

Devices A and B see the files are deleted and propagate that to C, then later they see a new device called D connects and adds those files again. Should the files really be ignored in this case? Are you saying that files which are deleted should never be restored by anyone?

Also I’m not so sure that A and B do have the full hashes of those deleted files. They have the path, yes. Not sure about the full hash.

It sounds like the obvious solution to your scenario is to avoid database corruption…

kahun · January 4, 2016, 10:34pm

I would say yes to file deletes handling that way as long as D tries to put those “exact” same files back. Remember D is trying to join an established ring of sync nodes. Otherwise it A, B and C will make a mistake of accepting D as a friendly node because D’s actions are going to be against to the will of three first nodes (like putting the deleted files back).

This can be a real issue only if A, B and C somehow want those “exact” files back and ask D to join and put the files back in.

canton7 · January 4, 2016, 10:39pm

What if B restores the files?

If you think that in this case, yes, the files should be restored because B is part of the “established ring”, then you need to start thinking carefully about device networks which aren’t everyone-knows-everyone. In a network with two clusters if devices with a single device in common, which devices form the “established ring”? What if two" established rings" with the same files (albeit some deleted/modified/etc) connect for the first time?

kahun · January 4, 2016, 10:44pm

B restores file in a normal situation or after loosing its database? If the situation is the latter, it just goes back to the original situation calmh mentioned.

If B restores them while in a normal sync, that should be just normal in my view. Because B already was aware of the deleted files. B realized that there was a mistake and put the files back in. This is no concern for A because A and B already have an established relationship, they approve eachother’s actions. However we cant say the same thing for C and D because they just have “joined” with fresh new databases and their certain behaviours should not be allowed like putting the deleted files back in.

At least that is my personal take.

canton7 · January 4, 2016, 10:47pm

Ok, so read the second half of my last post please.

Syncthing does not favour some devices over others based on some notion of an “established ring”: that concept falls down rapidly with more complex device networks.

kahun · January 4, 2016, 10:51pm

I understand, that is a great point.

My scenario is plausable and St will act wrongly within given context. The context is that C is wanting an exact copy without messing up A and B’s repositories. However givent that ST wont take sides in this particular case, the final sync will be a mess for everyone involved. A and B will have all the deleted files back (even though they do not want those files back). C wont have an exact copy of A and B because the sync was just a merge.

canton7 · January 4, 2016, 10:52pm

Also the more complex you make behavior like this, the harder it is to explain to people, or for people to understand why certain things are happening. That leads to more requests for support and user dissatisfaction.