stdiscosrv + strelaysrv systemd services not restarting on upgrade

Nummer378 · October 7, 2021, 4:52am

Posting this here for dicussion before opening up an issue, because I haven’t thoroughly checked this yet.

The recent 1.18.3 syncthing release also contained 1.18.3 releases for stdiscosrv + strelaysrv in apt.s.n. A bit unexpected, but welcome .

However, both of my systemd services did not restart after the upgrade, even though they were enabled and running prior to the restart. Systemd logs show:

 systemctl status strelaysrv.service
● strelaysrv.service - Syncthing Relay Server
     Loaded: loaded (/lib/systemd/system/strelaysrv.service; enabled; vendor preset: enabled)
     Active: inactive (dead) since Thu 2021-10-07 06:20:22 CEST; 19min ago
       Docs: man:strelaysrv(1)
   Main PID: 933 (code=killed, signal=HUP)
        CPU: 2min 9.075s

Okt 02 00:55:35 <hostname> systemd[1]: Started Syncthing Relay Server.
Okt 02 00:55:40 <hostname> strelaysrv[933]: 2021/10/02 00:55:40 main.go:140: strelaysrv v1.15.0 "Fermium Flea" (go1.16.3 linux-amd64) deb@build.syncthing.net 2021-04-06 06:03:>
Okt 02 00:55:40 <hostname> strelaysrv[933]: 2021/10/02 00:55:40 main.go:146: Connection limit 419430
Okt 02 00:55:40 <hostname> strelaysrv[933]: 2021/10/02 00:55:40 main.go:239: URI: relay://0.0.0.0:22067/?id=<my-relay-id>&pin>
Okt 07 06:20:22 <hostname> systemd[1]: strelaysrv.service: Succeeded.
Okt 07 06:20:22 <hostname> systemd[1]: strelaysrv.service: Consumed 2min 9.075s CPU time.

I’m guessing that this may be related to the new way of handling systemd stuff. The exit code also indicates that the service was terminated due to HUP signal (generated by the upgrade script?!), but this didn’t trigger a restart of the service, only a non-restarting exit. I think the main syncthing binary restarts when getting that signal, so it works there, but not on the other binaries.

A simple exec of systemctl start <service> got things back up to running state, but I expect services to do this themselves, at least if they were running/enabled previously.

The main syncthing service did restart as expected, just discosrv and relaysrv were affected.

calmh · October 7, 2021, 6:01am

I released the extra relay packages yesterday because the ones previously out there had a systemd restart issue that should have been fixed. (The disco packages came along for the ride / symmetry.)

If I remember correctly the upgrade script tries some sort of systemd detection and then does either systemctl restart or HUP. It sounds like it decided to HUP anyway on your systems.

The systemd units for relay and disco don’t have the same restart directives as the main syncthing systemd unit. Maybe that’s something they should have.

Someone who uses systemd could experiment with fixing this.

neckcen · October 7, 2021, 8:19pm

I cannot speak for the discovery server because it is currently disabled on my system. But it is highly likely that my relay server was also SIGHUP’d during the upgrade process. It was stopped after the upgrade, I chalked it up to my various experiments with the services and proxies over the past couple days. Upon checking the logs though, it seems to have been running right up until I applied the upgrade.

So whilst the package does not outright fail to upgrade with a scripting error (which is what prompted the new release), it seem like there is still a bug somewhere .

Nummer378 · October 7, 2021, 9:05pm

I think I figured out what’s going wrong.

This script https://github.com/syncthing/syncthing/blob/main/script/deb-post-inst.template checks if

systemd is available
There is a running service with the expected name

If so, restart via systemd. Otherwise try a HUP.

So far so good, however this is only a template. Let’s have a look at what fpm actually generates in the final .deb file.

(I downloaded the current 1.18.3 stdiscosrv release from apt, and extracted the deb file)

# cat postinst

#!/bin/sh


after_upgrade() {
    :
#!/bin/sh

command -v deb-systemd-invoke > /dev/null
has_systemd=$?

if [ $has_systemd -eq 0 ]; then
    systemctl daemon-reload
fi

if [ $has_systemd -eq 0 ] && [ -n "$(systemctl list-units --plain --no-legend cmd/stdiscosrv/etc/linux-systemd/stdiscosrv.service)" ]; then
    deb-systemd-invoke try-restart "cmd/stdiscosrv/etc/linux-systemd/stdiscosrv.service"
else
    pkill -HUP -x stdiscosrv || :
fi


}

after_install() {
    :

}

if [ "${1}" = "configure" -a -z "${2}" ] || \
   [ "${1}" = "abort-remove" ]
then
    # "after install" here
    # "abort-remove" happens when the pre-removal script failed.
    #   In that case, this script, which should be idemptoent, is run
    #   to ensure a clean roll-back of the removal.
    after_install
elif [ "${1}" = "configure" -a -n "${2}" ]
then
    upgradeFromVersion="${2}"
    # "after upgrade" here
    # NOTE: This slot is also used when deb packages are removed,
    # but their config files aren't, but a newer version of the
    # package is installed later, called "Config-Files" state.
    # basically, that still looks a _lot_ like an upgrade to me.
    after_upgrade "${2}"
elif echo "${1}" | grep -E -q "(abort|fail)"
then
    echo "Failed to install before the post-installation script was run." >&2
    exit 1
fi

The line [ -n "$(systemctl list-units --plain --no-legend cmd/stdiscosrv/etc/linux-systemd/stdiscosrv.service)" ]; is the problem, because this should actually read just stdiscosrv.service. The template/fpm however inserts the full path from the source repository, which causes systemctl list-units to print nothing (because it matches exactly the given string). The script then thinks that there is no currently running systemd service.

(My knowledge with these template scripts is limited, so I don’t know what should be used instead of {{.Service}} to fix this)

Nummer378 · October 7, 2021, 9:15pm

For the main syncthing application however, it looks as expected. There it looks like this:

 "$(systemctl list-units --plain --no-legend syncthing@*.service)"

Which correctly prints running syncthing units.

Edit: I figured out where the strings come from. You can see it directly in the diff:

github.com/syncthing/syncthing

build: Do not use --deb-systemd with fpm (fixes #7548)

syncthing:main ← imsodin:build/fix-fpm

opened 01:20PM - 11 Apr 21 UTC

imsodin

+48 -22

Reuses the bits from https://github.com/syncthing/syncthing/pull/7552 that aren'…t related to nfpm (which requires go1.16), to fix #7548. Fixing as in not using `--deb-systemd` that was broken upstream, and doesn't get any attention, but restarting services in a script of our own. This is also a behaviour change, as `--deb-systemd` enables the service on install. I personally like that change: I dislike programs that enable their daemons automatically (thought that seems to be standard practice). No testing yet: I was too lazy to install ruby or build/download the docker - will test after CI built something.

For syncthing, systemdService: "syncthing@*.service" is set. For the other binaries,

systemdService: "cmd/stdiscosrv/etc/linux-systemd/stdiscosrv.service",

and

systemdService: "cmd/strelaysrv/etc/linux-systemd/strelaysrv.service",

are set in build.go. That’s where the values come from.

AudriusButkevicius · October 8, 2021, 8:11am

Suggest you open an issue with the explanation (or even better, a pr fixing it)