Integrate S3 cloud storage

creature · December 27, 2024, 9:50am

Hello, I’m the author if this PR about S3 integration into Syncthing:

github.com/syncthing/syncthing

WIP: new "virtual" folder type allowing to use cloud based block storage backends like aws S3

syncthing:main ← cre4ture:feature/own_fuse

opened 03:05PM - 08 Sep 24 UTC

cre4ture

+1166 -274

### Purpose [Miro-Board with sketches](https://miro.com/app/board/uXjVKqrwTv8…=/?share_link_id=694950083289) #### Cloud Storage Integration I started this project due to a need of mine to integrate syncthing into a private cloud environment based on the opensource project [garage](https://garagehq.deuxfleurs.fr/). For this, it would be useful if syncthing could directly use a cloud API like S3 to store the data of the hashed blocks. I have the impression that the strategy of storing the data in a hash block storage rather than in a file-based storage makes the implementation in general easier and avoids data duplication automatically. A syncthing node that runs on the cloud anyway doesn't need to provide access to the files directly, and therefore, the storage system is not visible to the user and thus we can choose whatever we want there. This PR partially solves this issue: #8113 It's only partially, because currently the virtual folders implementation doesn't support the `receive encrypted` feature, which is probably an issue for some use cases where no private cloud is used. #### Data-Scrubbing for bit-rot prevention Even though it is a completely different use-case, the implementation for it has some overlapping parts. In the syncthing forum I found two discussions. This is the link to the up-to-date one: Forum: [bit-rot-prevention](https://forum.syncthing.net/t/bit-rot-prevention-scheme/15276) The desired feature there is to have an automatic bit-rot detection and correction feature inside syncthing. In contrast to cloud-storage systems like garage, that have their own integrated bit-rot correction, local disk filesystems usually don't include checksums and automated correction. E.g. typical filesystems like NTFS and Ext2/3/4 do not have it. More complex filesystems like ZFS and BTRFS are usually not used on end-user devices like desktop PCs, notebooks, handys or tablets. The big issue with providing the bit-rot correction feature is the problem of differentiating desired, good changes in the data from undesired, bit-rot caused, bad changes. This PR included initially two different strategies (one now split to #9698) that enables both separately the detection of undesired, bad changes. Both strategies are based on providing a fuse-based filesystem mount for syncthing folder data. This way, the user is not accessing the data directly via the native filesystem but instead through the fuse-fs-syncthing mount. Any read is directly forwarded to the storage backend. Any write or modification is handled explicitly and specially by the syncthing versioning logic. This way, all intentional, good changes are directly visible to syncthing. And any change that happens to the data in the storage backend without the interaction with syncthing can be classified as an unintended, bad change. These bad changes are to be detected by regular full-data scans that compare the actual and the expected hash of the data. After detection, the data can be corrected by downloading the data-block from one of the connected other devices. #### Current State of the PR I know that there was already a project that combines FUSE with syncthing: [SyncthingFUSE](https://github.com/burkemw3/syncthingfuse). This is archived and no longer maintained. I had a look into the source code if it's possible to resurrect this code somehow. But I had to learn that the code included way too much code duplication with the original syncthing code. Updating all of this is a lot of effort. Considering that its already outdated for years, I decided it's better to start from scratch. And I was right. The FUSE support on this PR includes not only reads as syncthingFUSE, but also creates, writes and deletes. And it's directly integrated into the official syncthing code such that minimal, to no, code duplication is achieved. Currently it downloads missing blocks for files on-demand when they are accessed. When explicitly triggering a scan via the GUI, all missing blocks for all files in the global state are pulled. I decided to publish this PR here, even though the state is still experimental, just to get some early feedback from the maintainers. Especially feedback about the changes in the original syncthing codebase would be great. I would like to merge them in via cherry-picking as soon as possible to avoid future efforts on merging. ### Testing So far, just manual testing. To use the feature based on the data-block-storage for cloud, one needs to create a new folder with the type "virtual". IMPORTANT: The virtual folder needs two additional parameters: 1. URL for storage backend 2. Mount path for the fuse based mount As I didn't spend time on extending the GUI more, one needs to use the path config value for the virtual folder to encode these two parameters. `:virtual:` and `:mount_at:` are used for this. E.g.: ``` :virtual:s3://bucket-syncthing-uli-virtual-folder-test1/Rfuse_share2:mount_at:/home/uli/nosync-data/Rfuse_share2 ``` Additionally one needs to create the configuration files for aws: ![grafik](https://github.com/user-attachments/assets/2e69a65c-7885-4e2e-8dc4-8389973debae) ... as described [here](https://garagehq.deuxfleurs.fr/documentation/connect/cli/#aws-cli) I've chosen to use the package [gocloud](https://pkg.go.dev/gocloud.dev/blob) for the access to the S3 API. This has the benefit that also other cloud storage systems are automatically supported, by using a different URL. It also allows to actually use local filesystem for storage, in case where no cloud itegration is desired. E.g. ``` :virtual:file:///home/uli/nosync-data/Rfuse_share2_BlockStorage:mount_at:/home/uli/nosync-data/Rfuse_share2 ``` Currently, the directories need to be created manually beforehand. This will be improved soon. The storage backend is **not** used to store the data as content of regular files as syncthing traditionally does it. Instead, it is more efficient to store each of the data blocks separately, identified by its hash-value. This way, there is no need for regular scanning/listing of the S3 bucket content, which has been proven to be very slow and thus inefficient. ### Screenshots ![grafik](https://github.com/user-attachments/assets/1f4c85cb-1e86-4b23-a125-514cd481114a) ![grafik](https://github.com/user-attachments/assets/808666a8-f06f-41a4-9571-18f59c9e8db4) Nautilus shows mounts from syncthing: ![virtual-folder](https://github.com/user-attachments/assets/27ca7669-c866-488e-9ca9-f265e85146d2) Mount cmd line tool shows syncthing mounts: ![grafik](https://github.com/user-attachments/assets/87e182c8-d23a-40be-a6ba-7d266839eaf1) Listing the s3-bucket content: ![grafik](https://github.com/user-attachments/assets/1585a09e-241f-4b14-829d-368e95ca1a56) ### Documentation TODO

I tried to orientate the PR work on my personal goals and the already existing issue, by a different author on github:

github.com/syncthing/syncthing

Object Store (S3) backend

opened 08:11AM - 15 Jan 22 UTC

agowa

enhancement

### What problem your new feature would solve Storing Data in S3/Object Store b…ackend to utilize redundancy and cheaper storage costs when hosting the server in the cloud. And using the untrusted node feature the cloud instance can serve as a secure centralized reliable distribution point that can also serve as backup and provide increased bandwidth. ### How or why you think it is generally useful (i.e., not just for you) This would allow: * To more efficiently integrate syncing in a hybrid cloud environment where dynamically cloud nodes can be added without unnecessarily duplicating the data within the cloud provider. * To reduce costs (block devices are more costly than S3 buckets) when one or more of the nodes are hosted within cloud environments like Amazon AWS, Scaleway, ... ### What alternatives or workarounds you considered * Using S3fs as an overlay for the S3 to mount it as block storage so that Syncthing can write into S3 using this as a glue layer in between. => But this causes issues with caching as well as the possibility for timeouts and data loss. Esp. When a lot of data is accessed and the local disk is way too small for the data stored in the S3. It also has issues with multiple instances (like when the syncthing web-ui is deployed in on an HA environment or k8s cluster) accessing the same S3 bucket simultaneously... * Using s3backer kinda works, it solves a lot of the issues of s3fs, but doesn't allow concurrent access similar to s3fs. * Using Nextcloud with S3 backend instead. => Currently quite buggy and unusable esp. for very large files. Also, costs are extremely high because the S3 is never cleaned up and old data is still present. * Using Seafile => Currently bugged when used together with Backblaze...

During the discussion in the PR, it turned out that there are quite different opinions around about what the desired implementation should do.

Therefore I follow the recommendation to open a topic here on the forum.

I would like to mention that I will also try to work on a concept document in parallel that everybody can review on this PR:

This helps me to keep track about all the important aspects.

I would like to start with the discussion about one of the core questions: How to overcome the limitation of S3 that modification of existing objects is only possible by re-uploading the whole file - even the parts that where not modified. This is especially an issue on large files.

@calmh whats your opinion here?

calmh · December 27, 2024, 10:29am

I would love to have this feature. I have several workflows today that involve rclone that I would happily migrate.

Several projects (incl. Syncthing) produce build artifacts to the cloud, which then need to be rclone’d down somewhere for processing.
I have folders that live in Syncthing but that are also backed up to the cloud by first syncing to local disk and then rclone’ing to the cloud.

creature · December 27, 2024, 12:17pm

Ok, so one requirement would be that other tools than syncthing are able to fill/modify the S3 bucket. Syncthing needs to detect that changes and synchronize it to connected devices.

Is it important that the other tools communicate directly with S3 bucket? Or would it be possible to use a FUSE-Mount (provided by Syncthing framework) of the S3-bucket? The other tools would then write into the mounted filesystem rather than directly to S3 bucket.

What do you think?

bt90 · December 27, 2024, 12:44pm

Just a collection of random thoughts on the matter:

before we settle on implementation details, we need to figure out what use cases we want to cover. The case for cloud backup and not being limited by local storage capacity is pretty strong. Exposing a filesystem like view for other applications not so much IMHO
we should try to avoid S3 “scanning” if we don’t have to. Let’s keep it simple and enforce that Syncthing is in control of its objects
S3 isn’t a filesystem and we shouldn’t try to shoehorn it into a pseudo filesystem abstraction. We have our fair share of trouble on the receiving end with people running Syncthing on Samba/NFS shares.
we should try to keep the metadata in S3. If we keep it in the local database, the node might fail and you’re left with an unstructed pile of files or even blobs

calmh · December 27, 2024, 1:23pm

A scan as in listing a bucket every now and then doesn’t seem too arduous to me, and at least one use case of mine is to have other things put blobs into cloud storage and have Syncthing pick them up from there.

As for metadata, I don’t think we need to go overboard. We can more or less treat it like we treat a FAT file system - permissions are lost and best ignored, unless they are retained in the database. Modification times may be similar - we already have handling for this in the database, if it’s not possible to set the time specifically in S3.

olfway · December 27, 2024, 1:44pm

Actually it’s possible to “update” a part of file without re-uploading using multipart upload feature if the file size is more than 5 mb

creature · December 27, 2024, 1:57pm

OK, this is interesting. It should reduce the problem significantly. But one needs to keep in mind that the minimum upload is 5MB. Smaller changes like 128KB-block as we have it usually in Syncthing still require an upload of 39 additional, unchanged 128KB blocks.

creature · December 27, 2024, 2:15pm

@calmh have you ever tried to use s3fs (https://github.com/s3fs-fuse/s3fs-fuse) in combination with Syncthing? I think they probably already implement the multi-part upload-trick to only update 5MB at a time.

For me its unclear how a Syncthing integration would improve the performance compared to a s3fs setup if we internally do just the same thing as s3fs.

calmh · December 27, 2024, 2:21pm

I haven’t tried it, mostly because it seems operationally annoying to maintain and possibly fragile, integrating with Syncthing in k8s etc.

creature · December 27, 2024, 2:24pm

RClone can also be used for this. I tried this and it was also slow, but I didn’t do a direct comparison with s3fs. If we go in the direction of using S3 as normal filesystem, one should first focus on existing solutions.

creature · December 28, 2024, 11:23am

for more convenience, one could do the necessary calls to either “s3fs” or “rclone” from within syncthing. Then the only question is if we deliver the tool binaries together with syncthing, or we require the user to install them manually.

This would probably be a quite low-effort and low-maintaince solution.

But it does only provide the performance that is already possible by using these tools. No improvement is to be expected. Also, the parallel access to the data from different syncthing instances is not fully supported - I read about several issues there.

calmh · December 28, 2024, 4:16pm

Being able to integrate all of rclones supported backends would be cool, even better if just by adapting and linking to the existing code.

creature · December 29, 2024, 8:39am

Being able to integrate all of rclones supported backends would be cool

Sure it is, but what about the disadvantages?

very slow scanning for changes
no save parallel multiple instance access support
re-upload of at least 5MB (or full file size of smaller) for every small change
…

The implementation of my PR also uses an already existing framework for cloud storage access: Blob · Go CDK This framework does support more than just S3: Google Cloud Storage, Azure Blob Storage is also part of it.

It also aims to support the save parallel access of multiple syncthing instances. The format of storing the data natively supports that. This is not the case for rclone or s3fs.

So, how can we figure out the importance of the different requirements? How can we come to a conclusion if the performance of existing solutions like s3fs or rclone is good enough?

calmh · December 29, 2024, 5:46pm

So, generally I will optimize for functionality rather than performance. We can see some tradeoffs of storing files as blobs. The details can perhaps be discussed (why is listing blobs “very slow”, for example? Is this fundamental to the concept or an aspect of some specific implementation? etc) but let’s instead look at your suggestion, as I understand it.

We store individual blocks as blobs, named by their hash. There are advantages to this:

We get automatic deduplication of data (mostly), saving some storage space
Modifications can be more efficient as we just need to upload changed blocks

On the other hand,

It’s not useful for any of the use cases that deal with blobs as files, e.g.
- syncing to cloud so that you can give someone a link to a file or for serving static sites / data
- accepting input from other sources and syncing that to somewhere
- sending data to other processes that expect data in s3, like using S3 Events
- using other features built into S3 like life cycle management to keep files for a certain amount of time or serving files as BitTorrent
It’s not as useful as backup, because it’s harder to restore from and depends on an external mapping source (the index database) and a specific tool (Syncthing)
It requires a separate garbage collection step to clean up deleted blocks, which is some additional complexity and also means a single instance must have exclusive access
It may in fact be less efficient for initial upload, creating up to 2000 blobs for one file instead of one large multipart upload

Is that about right?

creature · December 30, 2024, 10:24am

Everything you wrote makes sense.

…

What bothers me the most in our current discussion is that I do not get most of my questions answered from your side. I would be glad to get more direct responses to them instead of general speaking on a higher level.

E.g.: One of my biggest question is about the why we are not satisfied with the solution of using external tools?

You can use “rclone mount” or “s3fs” (or even others) - even though it needs minimally more administration and maintenance effort, these tools are able to to fulfill all your requirements without changing a single line of code in syncthing. Except performance though… but this seems to be on second priority, so no need to bother.

By the way, some toughts on this are already in the initial description on the GitHub issue: Object Store (S3) backend · Issue #8113 · syncthing/syncthing · GitHub

To more efficiently integrate syncing in a hybrid cloud environment where dynamically cloud nodes can be added without unnecessarily duplicating the data within the cloud provider.

Its mentioning efficiency as main point, which I understand about performance.

What do you think specifically about this topic?

calmh · December 30, 2024, 12:58pm

Like I said, I personally have not tried to use the integration with other tools because it seemed unnecessarily complicated for what I was trying to do. I know of others who have used it successfully with, for them, adequate performance. I suspect we could do better with an internal implementation though, if not in performance then definitely in ease of configuration.

However this discussion came about from your specific implementation proposal, hence I guess the discussion becomes more about the merits and demerits of that specific approach.

If I’m dodging some other questions I apologise, I’m following a lot of topics and discussions and it sometimes becomes a little bit like the doctor who checks in on each patient for 10 seconds at a time

I guess this sounds like a mostly send-only use case, otherwise I’m not sure how you’d handle for example garbage collection of deleted blocks successfully, and I’m not sure how you’d conveniently spin up new nodes with a copy of the active database etc? But anyway, given that, I think it would work equally well using blocks-as-blobs as with files-as-blobs or, for that matter, with mounted storage using rclone or whatever.

creature · December 30, 2024, 4:38pm

Your statement (in the PR) about not accepting the block-as-object approach in general, lifted the discussion topic to a different level. Now its more about use-cases, requirements and limitations of different approaches.

And you are right, the files-as-objects strategy would have many advantages. But in my eyes the implementation for this is much more effort than the blocks-as-objects strategy. Except of course when we re-use existing implementations like s3fs or rclone mount.

But I personally was not satisfied with the performance of the combination with s3fs and rclone mount. I think that the issue is really related to the conceptional idea. So my implementation is for all that prioritize the performance higher than others, and accepts certain limitations in turn. I was searching for an approach that allows for more performance in S3 context compared to the existing solutions and is still easy to implement. The strategy I’ve chosen brings a large amount of benefits, feature-, complexity- and performance-wise:

no need for scanning the folder for changes
no need for watching the file system for changes
just storing metadata as database-file(s)
no need to support different OS metadata handling especially
no need to have modification timestamps detection strategies
no need to deal with conflicts on concurrent file modifications
possibility for data corruption detection (bit-rot, …)
different versions of the same file (e.g. for snapshots) can exist in parallel (in the database-file(s)) without the need to duplicate data
sharing block-data across multiple syncthing folder shares, which is beneficial when moving data from one syncthing folder into another
the mount-functionality that I included allows quite easily to implement a feature like OneDrive: On-Demand downloading of file content. Just caching a part of the folder locally. I did a prototype implementation for this already.
spawning a new syncthing node would be quite easy: The new node searches for the latest full database-snapshot from another existing node in the S3 bucket, and copies that to use it as starting point. This is about meta-data copying only. The data-blocks itself can be re-used. Having the metadata separately stored for each node avoids scanning for changes and prevents any synchronization conflicts due to parallel access.

I’m currently investigating into “restic”. I detected that the strategy that I’m implementing is quite the same as restic is using. Just that the restic implementation is much more sophisticated and tuned. Re-Using that would avoid to spend efforts on things like the proper garbage collection and the packing of multiple blocks into one object file to improve performance even more. In my test, it generated objects of the average size ~17MB. The performance of writing to the cloud back-end was significantly better compared to the 128KB object file approach.

So, I see the my implementation as a tradeoff regarding features and performance. One can’t do all the use-cases that one could do with the files-as-object strategy. But in turn, its performing much better for the use-cases some others and me prioritize and it opens the door for some features that are where so far not really possible.

For your information, I decided that I will continue to follow this path till I’m able to do proper performance tests anyway. Even when its finally not accepted for the merge to main branch. I’m not taking it personal nor emotional if it dies at the end. So you can relax in case this was bugging you. I want to try the approach in my personal environment and deliver some numbers.

calmh · December 30, 2024, 6:38pm

Rock on, it’s what open source is for. I’ll just reiterate that numbers aren’t going to be the deciding factor for whether it gets merged into Syncthing, that’ll be use cases vs complexity, more or less.

kellytrinh · December 31, 2024, 1:03pm

FWIW - Restic average pack size is actually configurable and restic itself allows up to 128MB…

creature · December 31, 2024, 6:47pm

I’m fine with this. And I would like to continue the discussion about this. I’m just not sure how. Apparently, right now this topic is highly depending on personal perspectives and priorities.

So apart from understanding each others viewpoint, there is not much to do if both of us stay on the perspective we currently have.