So we had an outage on a couple of things. The reason was that those things live on a server of mine hosted at a place, and there was a fiber cut to that place. However it could, for the same effect, equally well have been a failure of the server hardware or the firewall that protects it. Most of the things living on that server are quite non essential:
- The forum. I’m sad when it’s down because I enjoy it, but in the end no one gets hurt and files are still synchronized all over the world.
- The website. We make a less than optimal impression with it down, but again it’s a minor inconvenience.
- One of the three discovery servers. The discovery service is redundant for this reason, so no worry.
- The build server. Hampers development, prevents doing an actual release in the proper way while it’s unreachable. We get by.
- The usage reporting server. We get a blip in the usage reports while it’s down but that’s all.
However a couple of things hurt a little bit more:
- The relays registry. Without this, Syncthing clients can’t get the current list of relays so run without relaying.
- The APT repository. Debian users can’t upgrade or do new installs using apt while this is down.
- The docs site. Users should be able to access the documentation.
The reason this is all hosted on the same box at the same place is pure convenience, laziness and economy. I don’t pay to spin up more VMs on my own hardware. However, this obviously won’t cut it, going forward, so here’s what I think we should do:
- The relay registry should be integrated with the discovery service and enjoy the same resiliency. If this moves to a DHT or something in the future, we get that for free for the relays as well.
- The web site and APT repository should be moved to a real cloud provider.
I’ll probably continue running the forum and, especially, build server for a while as these are happiest as fairly beefy VMs that are expensive (for us) to host externally. In the long run this should change too, both to remove a dependency on my hardware and hosting, and to remove a dependency on me personally.