Amazon aws s3 back-end?

Andrei_Pozolotin · August 1, 2015, 5:52pm

I am curious what are “the best” ways to deploy syncthihg with amazon aws s3 back-end? Please share your experience.

canton7 · August 1, 2015, 6:22pm

The only option I can think of is to mount S3 as a partition using something like s3fs-fuse. Last time I tried something like that (without Syncthing) it was quite unreliable.

Depending on your caching settings, every time Syncthing does a scan your S3 fuse will likely have to download a lot of data, so watch out!

Overall, not something I’d recommend.

AudriusButkevicius · August 1, 2015, 7:19pm

You could have something which listens to syncthings events, and pushes stuff to S3 as it changes.

NickPyz · August 1, 2015, 8:16pm

Amazon S3 is inert storage - you can’t run an instance of Syncthing on it. What you can do is syncronize an ST shared folder on your PC with S3 using 3rd party software like Cloudberry (Windows, not sure about other OS’es):

http://www.cloudberrylab.com/blog/how-to-sync-local-folder-with-amazon-s3-bucket-with-cloudberry-s3-explorer/

Another AWS possibility is to run ST virtually using their EC2 hosting service. I have never tried this - maybe others can advise if this would work similar to running ST on a VPS server - (which does work perfectly).

canton7 · August 1, 2015, 9:01pm

This would work fine, except that EC2 can’t read/write directly to S3 any more than any other computer can. You could create an EBS volume, mount that as a partition, and occasionally create a snapshot in S3 of it - but that isn’t the same as storing your files directly in S3.

Andrei_Pozolotin · August 1, 2015, 9:19pm

they say dropbox runs on amazon s3 and seems happy doing so (?)

so I would imagine some form of “native” s3 (and other major s3-like cloud providers) support could be on the ST road map?

can ST developers please comment on that?

canton7 · August 1, 2015, 10:21pm

There’s a slight difference: Dropbox is a centralised synchronization solution: there’s a central third-party server that’s always online, and always holds a copy of your files. This central server happens to use S3 as its backing storage, but that’s entirely transparent to you: they could be storing your files in tapes and you’d be none the wiser. The fact that Dropbox chooses to use S3 is not a feature, and it’s not something that you can access: it’s just an implementation detail.

Syncthing is a peer-to-peer synchronization solution, which means that there’s no central third-party server holding copes of all of your files. The aim is to synchronize your files directly between all of your machines which need access to those files. Therefore the value in having one of those machines write files that it receives directly to S3 is questionable: if they’re written straight to S3, then the machine that wrote them won’t have direct access to them. And if that machine doesn’t need direct access to them, why are you syncing to that machine?

If the answer is “backup” or some such, you probably want a separate tool which backs up your files to S3: there are plenty of them around.

Having a system that writes/reads files directly to/from S3 would likely be a fair amount of work, as it would break a lot of assumptions currently being made about file access, as well as requiring significant change to all parts of the application which currently touch the filesystem.

Andrei_Pozolotin · August 2, 2015, 1:52am

I think you are right, I better start looking on it in a “backup” context.

NickPyz · August 2, 2015, 2:05am

Thanks for clarifying. Bottom line - S3 is not a host for executables such as Syncthing. It was designed for file backup.

If the original poster decided to create a virtual EC2 server with Syncthing, the sync’ed files will be stored on Amazon EC2 - however as you have stated, there is no programmatic Amazon method to move those files over to S3.

calmh · August 2, 2015, 6:06am

Theoretically though, we could support something like an S3 backend as the folder path. I’d be a bit special-special, but doesn’t sound impossible.

canton7 · August 2, 2015, 10:49am

Could be useful for setups where Syncthing is being used in a centralised manner (with one a central always on host). It looks like fetching part of a file is supported too… Symlinks could be fun though!

AudriusButkevicius · August 2, 2015, 10:54am

Screw S3, it’s only one of the fishes in the ocean. I’d rather people spent their effort on something more useful that doesn’t turn you into an AWS zombie. There is webdav, etc.

NickPyz · August 2, 2015, 3:46pm

As an Amazon S3 customer, I was tempted to give this a “like”.

However, this potentially opens a huge can of worms. If Syncthing supports S3, I believe it might open the floodgates for similar requests from those who use MS OneDrive, Google Drive, Apple iDrive, box.com, Dropbox, etc. etc.

Does this project have the resources to support this? Consider the upfront development, plus the on-going maintenance - as these different cloud services frequently modify their API’s.

calmh · August 2, 2015, 5:56pm

Note though that the S3 “protocol” has some following. There are some rather nice open source implementations of that for when you want to run your own object storage and have it used by whatever that knows how to speak to S3. So I could see some uses outside of AWS.

AudriusButkevicius · August 2, 2015, 6:51pm

Same way like open shift (or whatever its called from open stack) does. S3 is an implementation, not a standard.

tombh · March 5, 2019, 2:28pm

I’ve been using Syncthing on a Google Bucket mounted folder using gcsfuse. As @canton7 points out earlier there are some issues with it. However so far it seems usable.

I’m syncing about 3Gb of data in about 18000 files. The UI does seem to have problems though with the default frequency of 10 mins for rescans. It would seem that inotify doesn’t detect changes, so the rescans are essential (only if changes need to be sent from the folder?). The problem I see during big rescans is that UI doesn’t commit any changes I make to config. I don’t need to propagate changes out from the mounted folder so I’ve set the rescan time to 10 hours and that helps a lot.

I’m surprised there aren’t more people wanting to use cloud buckets with syncthing. It’s so cheap, decoupled from VM instances and comes with all kinds of CDN/caching/versioning features.

BTW, just to add to the S3 discussion - as far as I know all the major cloud storage providers support S3-compatible API calls.

bt90 · March 5, 2019, 7:00pm

I can only speak for myself but the sole reason i’m using Syncthing is being able to ditch those kind of services.