`rest/db/status` returns 404 on startup

canton7 · January 12, 2020, 5:21pm

Is it expected that rest/db/status for a folder returns a 404 with the body “folder is not running” on startup?

The behaviour used to be that the API wouldn’t start at all until all of the folders had started. However I’m now getting reports of a 404 when trying to fetch rest/db/status after startup.

If this is an intentional change, is there a good way of telling when Syncthing has started, before polling rest/db/status for each folder? Or should I sit there polling all folders until they all return non-404? Does a 404 from this endpoint always mean “temporarily unavailable”, or are there cases where a 404 here is a genuine, unrecoverable error?

imsodin · January 13, 2020, 8:21am

Do you see this on v1.3.3 too? I am not sure whether I get the release history right, but I did a refactor that changed the behaviour as you describe inadvertently, and then fixed that a bit later, and I think the change was in v1.3.2 and the fix in v1.3.3.

Folder summaries (aka /rest/db/status) only make sense for running folders, so I think a 404 can mean any of not present at all, not running or such (didn’t check). In any case I wouldn’t poll on that but listen to folder summary or state changed events to determine whether a folder is running.

Nummer378 · January 13, 2020, 12:21pm

Just for context, this what this is about: https://github.com/canton7/SyncTrayzor/issues/547 (I opened that issue)

I just did some testing today and I figured out the following:

The mentioned behavior (API returning 404 when SyncTrayzor calls) is highly racy. Sometimes it does, sometimes it doesn’t.
I can reproduce this about once every few days under normal conditions - on multiple machines, at least two. Using set STRECHECKDBEVERY=0 I can trigger the error on a machine that seems prone to the race with a “success” rate of almost 100% (on syncthing v.1.3.3).
I can only reproduce this on v1.3.3. I’ve tried v1.3.2 on the racy machine with the same settings and it never returns 404 (by the way, it also never crashes on either version).

imsodin · January 13, 2020, 12:56pm

Thanks for the info.

That makes sense. That env var delays starting folders (it first needs to recalculate db stats).

I now did check and v1.3.2 was pre refactor and v1.3.3 contains the refactor and the fix - ~~I don’t understand yet how what you describe can happen but I can investigate with the info I have.~~ Nevermind, I reproduced and then understood the problem (the fix I mention above is misconception of mine - it isn’t related).

@canton7:
At least in my understand relying on all folders having started by the time the API has started is not ideal - I wasn’t even aware this was by design and just didn’t (intend to) change it because I tend to not change things when I have a choice E.g. as metadata recalculation can take a long time, it would be totally sensible improvement to add a “Starting” folder state that applies then - I would have made that change without a second thought towards this “time of API start vs all folder up and running”-problem. I’d utilize the event api or handle “folder not running”, as the same can also happen when unpausing a folder (in which case it’s expected to be running too but might not be for a long time).

canton7 · January 13, 2020, 1:11pm

This behaviour was in flux a few years ago (and the consequences of accessing the API “too early” were a lot worse than a 404, although I don’t recall exactly what they were), but after discussion it settled down, and the API only started once Syncthing was ready to be accessed over the API.

rest/db/status does return the state of the folder, in the state field. It’s absolutely fine for this to contain a state which indicates that the folder isn’t running. What’s not expected (from a consumer point of view) is for this endpoint to return a 404 (which indicates either “API endpoint doesn’t exist” or “folder does not exist”, depending on how generous you’re feeling) at points that are unpredictable to the consumer.

Note that the events API doesn’t have an event for “folder not running” either.

As I said in the OP, I can interpret a 404 on this endpoint as “folder not running”, or I can parse the body text, if that’s what the API requires, but both options smell a lot. I wanted to make sure that this was the designed API before implementing this, rather than my implementing something unnecessarily fragile.

Note that 404 indicates that there is a problem with the request, i.e the consumer did something wrong, by requesting an endpoint which doesn’t exist. 5xx errors are server-side errors.

imsodin · January 13, 2020, 1:28pm

Thanks again for all the info. The change in behaviour in v1.3.3 that API starts before all folders are up and running was not intended and I see now why it happens - I’ll fix that.

The state in rest/db/status is confusing indeed: That’s about the state of the running folder, i.e. whether its idle, pulling, scanning, errored, … A paused folder is not running, i.e. doesn’t have such a state. I think whether the folder is paused or not is currently only visible in the config. Maybe that should be augmented.

calmh · January 13, 2020, 1:29pm

I think this endpoint should return a valid response for any existing folder, running or not. Ideally we should know which folders exist before starting the API.

“Paused” would seem to be a reasonable folder state to have.

(I think restarting a folder involves removing it and then adding it again. Perhaps this introduces a window of error in between those times…)

imsodin · January 13, 2020, 1:53pm

General question: If we are to return a paused state, no information on almost all of the other fields will be present (no fileset access). Can we expect consumers to be able to deal with missing entries or do we need to add them (with ~~null~~ zero/default values), aka would Synctrayzor be able to handle that (without changes)?

canton7 · January 13, 2020, 1:56pm

SyncTrayzor will handle them, because it deserializes into a strongly-typed model and the deserializer will substitute a suitable default if something’s missing. I can’t say for any other API clients. I think providing a default yourself is safer.

Just to clarify, using an actual null would be a break for the fields which are e.g. integers, as it’s a change of data type. Using e.g. 0 for globalBytes would be preferable IMO.

imsodin · January 16, 2020, 10:08pm

I left the API startup for now, but the following PR treats known, unpaused but not yet started folders like paused ones, preventing the 404 you experienced: https://github.com/syncthing/syncthing/pull/6272

canton7 · January 17, 2020, 1:52pm

Thanks! That’s much appreciated.

imsodin · January 17, 2020, 2:06pm

Your welcome, so is your feedback and generally all your work on the amazing Synctrayzor!
Just to confirm: You are generally fine with the API starting before all folders are up and running (barring any other misbehaving endpoints of course, which would need fixing too)? Then I’ll leave that part as it is.

canton7 · January 17, 2020, 2:27pm

I think so? Although I’ll have to test to be sure. What does the rest/db/status endpoint return before the folder has started?

imsodin · January 17, 2020, 2:40pm

The same thing as for a paused folder, an essentially (entirely?) empty folder summary.

canton7 · January 17, 2020, 2:41pm

Should be fine, then. I think paused folders are handled correctly (at least, I haven’t seen any complaints there).

pthubbard · February 8, 2020, 7:51pm

I’m still experiencing this issue on one of two machines. I set both to include the RC version but one machine still gives me this notification on start up more often than not. Everything else seems to work fine. Should I expect any difference with 1.40?

Nummer378 · February 9, 2020, 12:56pm

The proposed fix/workaround for this was PR 6272 (see post #10 `rest/db/status` returns 404 on startup)

That commit has not yet landed in any release or RC version, but is scheduled for 1.4.0. The first RC for 1.4.0 will probably appear soon™ (usual date would be next tuesday)

Nummer378 · February 11, 2020, 5:09pm

I can still reproduce this issue with 1.4.0-rc.2. SyncTrayzor still produces an error message on slow/delayed startups due to an unexpected API response.

However, with 1.4.0 the error message changed slightly:

Previously it was a 404 response with body “folder is not running”.
Now, I’m seeing 404 with body “no such folder”.

imsodin · February 11, 2020, 5:25pm

Thanks! https://github.com/syncthing/syncthing/pull/6326

Nummer378 · February 12, 2020, 8:23pm

This seems to be fixed in 1.4.0-rc.3.