Questions about hash database preseeding and file versioning

Hi all, I have two questions regarding the hash database and file versioning.

  1. Having two relatively huge (500+ GB) dataset, it is possible to pre-seed the hash database without synching any file? In other words, I would like to “prepare” the system before putting it into full production and enabling file replication

  2. There is a method to let file versioning create a new file only if the affected file was modified both on local and remote server? In short: single-side updates should not create a new archived, versioned file, while double-side updates should.

Thanks.

You can start syncthing, create a folder and not share it with anyone. That should build up the index. Once the index is build (scanning finishes), you can then share the folder.

A concurrent double-sided update is essentially a conflict, which will generate a new file (up to 5 of them by default).

Local changes are not versioned (because it’s impossible to version them, as by the time we realize the file has been changed, it’s already changed and we have no clue what the old file looked like).

Only remote changes are versioned.

2 Likes

Thank for your reply. Let me elaborate…

You can start syncthing, create a folder and not share it with anyone. That should build up the index. Once the index is build (scanning finishes), you can then share the folder.

Can (or must) I do that process on both replica sides? Having the two replicas already populated, I would like to preseed synchting on both sides.

A concurrent double-sided update is essentially a conflict, which will generate a new file (up to 5 of them by default).

Local changes are not versioned (because it’s impossible to version them, as by the time we realize the file has been changed, it’s already changed and we have no clue what the old file looked like).

Only remote changes are versioned.

I perfectly understand that local changes can not / should not be versioned. But what about a remote changes to a non-changed local file? Let me explain:

a) suppose I have a perfectly replicated, in-sync file on both local and remote side. Now a change happens on the remote side. I would like the local file to be updated without versioning, as no concurrent edit happens.

b) again suppose I have a perfectly replicated, in-sync file on both local and remote side. During the rescan interval, the file is edited on both local and remote sides. I would like the file to be versioned, possibly on both sides (or at least on the side with the older file).

While it seems an artificially constructed scenario, please think about this case: a local fileserver is replicated on a branch office, and it’s main use is to store frequently updated Office files. While the files are updated quite often, they very rarely are concurrently updated (ie: each users usually works with his files only). I normal, non-concurrent updates, I would really like to avoid the space penalty commended by versioning. On the other side, if the same file is concurrently updated on both sides, versioning is key to avoid data loss (from an user standpoint).

Thank you for your time!

What you are talking about seems like a conflict. As file is modified on both sides before either side knew about it.

By default we keep 5 conflict files around.

I don’t understand the bootstrapping problem you are trying to solve. If the data is the same, it doesn’t matter when start syncing, it should just work,as it would recognise that the data is the same.

If you mean scanning on one machine and transfer the index to the other one: No, you shouldn’t do that :wink:

You should add the folder to Syncthing on both devices without sharing it. Syncthing will scan the folders and build its index. As soon as you share the folder, Syncthing will start syncing it, see that the files are all the same and transfer nothing between the devices (apart from the index exchange when connecting).

If you are sure that the files are identical on both sides already, you can scan on one side and copy the index. By doing this you are asserting that the files are identical, and any discrepancy from that will be interpreted as a recent change and synced to the other side. Don’t get your folder paths wrong. Don’t screw up. Not a supported operation.

You don’t need to do any of this - you can just configure and start syncthing on both sides and they’ll figure it out just fine once both initial scans are completed. This is the safe option.

Scanning 500GB takes a couple of hours. You’d be done by now if you’d done that directly, probably. :wink:

What you are proposing here would always lead to a conflict, but you could also share your versioning folder with the other machine, that would basically lead to local changes being versioned. Might not be very efficient though.

I’ve nver done that, but I have indexed my versioning folder in some cases, so files that I/syncthing deleted “accidentally” wouldn’t have to be transmitted again.

Sure, but as they are production machines, I need to do some test before :smiley: Moreover, the situation is slightly more complex due to one side having many symlinks that I want to follow during replication (I read the page about FollowSymlinks: apart disabling, which is an unsupported operation, it is safe to use this option?)

There is no followSymlinks. It was in a dev build, but never got merged.

Thank you all guys for yours explainations and patience :slightly_smiling:

I’m still somewhat confused about versioning. Let me quote myself:

a) suppose I have a perfectly replicated, in-sync file on both local and remote side. Now a change happens on the remote side. I would like the local file to be updated without versioning, as no concurrent edit happens.

It is possible to have this behavior with current replication strategies? My current understanding is that synchthing will use versioning in this case (saving previous file data), right? Or I am missing something?

Ah! Thank you for pointing that! So, I need to change current dir/file structure before using synchthing…

Any chance this feature will be merged in the near future?

Probably not, given it has very nasty edge cases, where by toggling the option can cause data loss.

Versioning is a choice, if it’s off, it won’t version anything.

True, but completely disabling versioning will lead to data loss in case of concurrent (on both sides) changes.

In short, something similar to Unison’s “copyonconflict” would be great for scenarios where both sides can be concurrently updated, but normally they aren’t.

I know syncthing can use an external script to manage conflict, but built-in features are always welcomed. Sure, I know patches are even more welcomed :smiley:

It seems you are mixing stuff up quite heavily. In case of concurrent modifications you will get a conflict, no questions asked (unless you say keep 0 conflicts).

Syncthing does not use an external script for conflicts, it uses external scripts for versioning.

Bingo! Your last comment put me on the right way. It was also on the FAQ :expressionless:

Yes, I was confusing versioning and conflicts resolution, simply because the other synchronization software I used managed conflict as a special versioning case. This was somewhat imprinted in my mind :smiley:

Thank you all for your patience!

“copyonconflict” looks like what we do per default.

Yes, now that the terminology is clear (in my mind :slightly_smiling:) I can see it should be basically the same thing.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.