Bad/Good news: I wasn’t able to achieve the performance goals I personally had for this implementation. I fear its too much effort to get this properly running on S3. Even with restic, there are some remaining issues: Restic does support parallel backup running on different nodes, but it doesn’t currently allow any backup while doing the garbage collection. Also the read performance with restic wasn’t as expected.
I might continue with implementation on S3 later, but for now. I will stop in favor of a much easier to implement approach that also should fit better into the existing strategy of Syncthing.
My main reason for using S3 was that I planned to use “garage” ( https://garagehq.deuxfleurs.fr/ ) on a privately managed cluster of PCs/servers. I planned to use garage to be able to achieve a robust distributed storage with high availability. Now, the idea is that one could achieve this by purely relying on Syncthing.
Idea: Use special new ignore pattern for sharding/partitioning of the data. This allows to achieve a replication factor of e.g. 3, distributed on a 3+ synchting connected nodes. The idea is to use the hash of the filename to equally distribute the data between the nodes. Each node gets its personal partition / shard assigned by specification of a ignore range for the filename hashes.
@calmh Do you know if anything like this was ever discussed before?
EDIT: I found these rather old discussions by a quick search, still I think the idea of using the hashed filenames seems to be new. What do you think about this?
2014: sharding on multiple servers, guaranteeing a specifyed redundancy.
2016: sharding to allow shared data-backend (network-filesystem) on multiple nodes: