BEPository: a syncthing snapshotting+archival sidecar

I finally took the time to publish (MIT+Apache 2.0) BEPository. It’s a syncthing snapshotting+archival sidecar I’ve been toying with for a few weeks. I did use LLM but I purport it isn’t a vibe-coded monstruosity (maybe still a monstruosity).

Technically it’s a BEP node that stores its data in an LSM tree (SlateDB) on top of an object storage (eg: S3/GCS/SFTP/…). It exposes snapshots of the data over webdav. Multiple instances can share the same storage, only one is active at a time (there is a lock and priorities). It does not monitor the filesystem and requires a “real” syncthing on the same machine.

My aim was to provide snapshots and an “always-on” virtual repository of the data that syncthing manages. Compared with using an entirely separate backup system (eg: Borg), this avoids the need to designate one host to drive backups and simplfies conflict resolution.

Feedback welcome! To be fully transparent, current limitations include: use of LLMs (as per above), on-disk format not stable yet, INSTALL.md possibly broken (except for NixOS) and there is no encryption yet.
The furthest away from storage, the lower my confidence. If you read the BEP implementation, please do not throw eggs.

– unbrice

3 Likes

Interesting!

One thing is a little unclear to me. The README says:

bepository sits next to a regular Syncthing instance.

… right under a picture showing bepository running on the same laptop as a Syncthing instance.

Further down the README says this:

  • Pick a storage backend (S3, GCS, SFTP, …) and configure credentials.
  • Install the daemon (Systemd Quadlet, NixOS flake, Podman Compose, or from source).
  • Pair it with your Syncthing instance

Question: Do I have to run bepository on the same device as a Syncthing instance? I guess it is not required. I think the documentation can be made clearer.

I had the same thought. My guess is it uses the sync connection for metadata, but snarfs file data from disk?

I advertently deleted my previous reply :-/ As I was seeing in the now-deleted post, I didn’t implement discovery or relaying. So you could technically run it on a different machine, but it would require a clear line of sight. The recomended use is to instead have multiple local instances, sharing the same the remote storage.

I added a FAQ and tweaked the readme a bit to clarify.