Evaluating Idea: Syncthing for Serverless

dc0d · May 29, 2018, 6:48am

Idea: Using Syncthing as a serverless architecture. There are some 10,000 devices that are sending logs/data to one or more servers. They also might need to communicate with each other.

Ouestion: Is Syncthing a valid option for this scenario?

Assumptions: Syncthing will be run as a separate process and managed via APIs.

AudriusButkevicius · May 29, 2018, 6:53am

I am not sure I understand the question.

What does this have to do with serverless, if you have devices that run processes and act as servers?

dc0d · May 29, 2018, 6:56am

Serverless (AFAIK) is not ever truly serverless! It is true that some nodes might act (conceptually) as servers, but they are not real servers with public IPs, APIs or RPCs, etc, etc.

The whole swarm uses Syncthing without any servers (TCP/HTTP/REST/etc).

calmh · May 29, 2018, 7:03am

The point of serverless, also AFAIK, is to run functions in the cloud and not worry about virtual machines, storage, etc. (Clearly there is actually a server involved at some point.)

Syncthing doesn’t let you not worry about your virtual machine and your storage, so I think it will not run in a “serverless” way.

dc0d · May 29, 2018, 7:07am

You are right.

If we drop the serverless phrase, does this scenario make sense?

AudriusButkevicius · May 29, 2018, 7:18am

Scenario of devices syncing files among themselves? This is just normal syncthing scenario. Still not sure I understand the question.

dc0d · May 29, 2018, 7:22am

The question is about number of devices: between 10,000 to 40,000.

Also some nodes have to have all the shared directories (between 10,000 to 40,000 directories). And if I understand correctly, there is a 1 million cap for the number of files.

AudriusButkevicius · May 29, 2018, 7:49am

I don’t think there is any cap. There is the overhead per device, so you might want a tiered architecture and custom UI as rendering UI with 10k devices is hard, but this is all solvable things.

dc0d · May 29, 2018, 7:53am

There is no UI, neither on devices nor on server/data warehousing nodes. All will be managed via Syncthing REST APIs.

imsodin · May 29, 2018, 8:01am

Have you seen data.syncthing.net? The most connections reported are 2.2k. That already seems like a lot. However if you do setup a network topology adapted to your use case as Audrius suggested, such that not every node is connected directly to all others, I don’t see a problem with 40k devices.

With logs there might be another problem: If you want to sync “live” logs, they might change too fast. I.e. if in between Syncthing noticing a change and other devices requesting the changed log, the same log already changed again, there will never be a successful transfer. However I assume you’d have some kind of regular job to copy live logs to synced logs, which would mitigate that problem.

dc0d · May 29, 2018, 8:06am

Thanks!

Those 40,000 nodes would not be interconnected. But all of them will be connected to (say) three supervisor nodes.

As for logs, those files are getting updated every two minutes (or even slower). So I was planing to use fs-watcher with some seconds of delay.

imsodin · May 29, 2018, 8:15am

That’s exactly the thing: To fully use the “power of p2p” connecting all nodes to central devices isn’t optimal. Maybe that works with beefy servers (I know nothing about that), but some structure would probably work better with much less hardware/network requirements. That can be a tree structure with your three supervisors at the top or breaking up the 40000 nodes into smaller (partially) interconnected groups, in which only a few are connected to the supervisors. Ideally such a structure would somehow come up naturally from already existing network or functional topology of your devices. Otherwise you can optimize it for lots of parameters (node failure stability, propagation speed (path length), simplicity …).

AudriusButkevicius · May 29, 2018, 8:28am

I have no idea what you are talking about. Syncthing has a ui and that will struggle to render with 10k devices and 10k folders.

Reading so far it seems like you are shoe horning p2p sync solution to a centralized sync problem. Why don’t you just run rsync via cron every 2 minutes…

imsodin · May 29, 2018, 9:28am

You can disable the web UI?

dc0d · May 29, 2018, 10:35am

Running Syncthing on a server (like Ubuntu Server) does not need and can not have a GUI - of-course it is possible to connect to it if the port is exposed to the outside.

There are documentation on how to use the REST API: https://docs.syncthing.net/dev/rest.html.

dc0d · May 29, 2018, 10:37am

Yes, it is possible to so using the configuration file.

AudriusButkevicius · May 29, 2018, 11:43am

How are you going to manage it in that case?

dc0d · May 29, 2018, 1:43pm

On each node a program is running which is written in Go that simply configs and runs Syncthing as an external process. And then communicates with it via APIs, or simply works with files inside the shared directory.

dc0d · May 29, 2018, 1:44pm

Currently I am studying ipfs which seems to be more fitting.

But I like Syncthing much better!

imsodin · May 29, 2018, 1:53pm

Take syncthing, I wanna know about a 40k cluster