Evaluating Idea: Syncthing for Serverless

Idea: Using Syncthing as a serverless architecture. There are some 10,000 devices that are sending logs/data to one or more servers. They also might need to communicate with each other.

Ouestion: Is Syncthing a valid option for this scenario?

Assumptions: Syncthing will be run as a separate process and managed via APIs.

I am not sure I understand the question.

What does this have to do with serverless, if you have devices that run processes and act as servers?

Serverless (AFAIK) is not ever truly serverless! It is true that some nodes might act (conceptually) as servers, but they are not real servers with public IPs, APIs or RPCs, etc, etc.

The whole swarm uses Syncthing without any servers (TCP/HTTP/REST/etc).

The point of serverless, also AFAIK, is to run functions in the cloud and not worry about virtual machines, storage, etc. (Clearly there is actually a server involved at some point.)

Syncthing doesn’t let you not worry about your virtual machine and your storage, so I think it will not run in a “serverless” way.

You are right.

If we drop the serverless phrase, does this scenario make sense?

Scenario of devices syncing files among themselves? This is just normal syncthing scenario. Still not sure I understand the question.

The question is about number of devices: between 10,000 to 40,000.

Also some nodes have to have all the shared directories (between 10,000 to 40,000 directories). And if I understand correctly, there is a 1 million cap for the number of files.

I don’t think there is any cap. There is the overhead per device, so you might want a tiered architecture and custom UI as rendering UI with 10k devices is hard, but this is all solvable things.

There is no UI, neither on devices nor on server/data warehousing nodes. All will be managed via Syncthing REST APIs.

Have you seen data.syncthing.net? The most connections reported are 2.2k. That already seems like a lot. However if you do setup a network topology adapted to your use case as Audrius suggested, such that not every node is connected directly to all others, I don’t see a problem with 40k devices.

With logs there might be another problem: If you want to sync “live” logs, they might change too fast. I.e. if in between Syncthing noticing a change and other devices requesting the changed log, the same log already changed again, there will never be a successful transfer. However I assume you’d have some kind of regular job to copy live logs to synced logs, which would mitigate that problem.

Thanks!

Those 40,000 nodes would not be interconnected. But all of them will be connected to (say) three supervisor nodes.

As for logs, those files are getting updated every two minutes (or even slower). So I was planing to use fs-watcher with some seconds of delay.

1 Like

That’s exactly the thing: To fully use the “power of p2p” connecting all nodes to central devices isn’t optimal. Maybe that works with beefy servers (I know nothing about that), but some structure would probably work better with much less hardware/network requirements. That can be a tree structure with your three supervisors at the top or breaking up the 40000 nodes into smaller (partially) interconnected groups, in which only a few are connected to the supervisors. Ideally such a structure would somehow come up naturally from already existing network or functional topology of your devices. Otherwise you can optimize it for lots of parameters (node failure stability, propagation speed (path length), simplicity …).

3 Likes

I have no idea what you are talking about. Syncthing has a ui and that will struggle to render with 10k devices and 10k folders.

Reading so far it seems like you are shoe horning p2p sync solution to a centralized sync problem. Why don’t you just run rsync via cron every 2 minutes…

1 Like

You can disable the web UI?

Running Syncthing on a server (like Ubuntu Server) does not need and can not have a GUI - of-course it is possible to connect to it if the port is exposed to the outside.

There are documentation on how to use the REST API: https://docs.syncthing.net/dev/rest.html.

Yes, it is possible to so using the configuration file.

How are you going to manage it in that case?

On each node a program is running which is written in Go that simply configs and runs Syncthing as an external process. And then communicates with it via APIs, or simply works with files inside the shared directory.

Currently I am studying ipfs which seems to be more fitting.

But I like Syncthing much better!

Take syncthing, I wanna know about a 40k cluster :slight_smile:

3 Likes