Is it expected that rest/db/status for a folder returns a 404 with the body “folder is not running” on startup?
The behaviour used to be that the API wouldn’t start at all until all of the folders had started. However I’m now getting reports of a 404 when trying to fetch rest/db/status after startup.
If this is an intentional change, is there a good way of telling when Syncthing has started, before polling rest/db/status for each folder? Or should I sit there polling all folders until they all return non-404? Does a 404 from this endpoint always mean “temporarily unavailable”, or are there cases where a 404 here is a genuine, unrecoverable error?
Do you see this on v1.3.3 too? I am not sure whether I get the release history right, but I did a refactor that changed the behaviour as you describe inadvertently, and then fixed that a bit later, and I think the change was in v1.3.2 and the fix in v1.3.3.
Folder summaries (aka /rest/db/status) only make sense for running folders, so I think a 404 can mean any of not present at all, not running or such (didn’t check). In any case I wouldn’t poll on that but listen to folder summary or state changed events to determine whether a folder is running.
I just did some testing today and I figured out the following:
The mentioned behavior (API returning 404 when SyncTrayzor calls) is highly racy. Sometimes it does, sometimes it doesn’t.
I can reproduce this about once every few days under normal conditions - on multiple machines, at least two. Using set STRECHECKDBEVERY=0 I can trigger the error on a machine that seems prone to the race with a “success” rate of almost 100% (on syncthing v.1.3.3).
I can only reproduce this on v1.3.3. I’ve tried v1.3.2 on the racy machine with the same settings and it never returns 404 (by the way, it also never crashes on either version).
That makes sense. That env var delays starting folders (it first needs to recalculate db stats).
I now did check and v1.3.2 was pre refactor and v1.3.3 contains the refactor and the fix - I don’t understand yet how what you describe can happen but I can investigate with the info I have. Nevermind, I reproduced and then understood the problem (the fix I mention above is misconception of mine - it isn’t related).
At least in my understand relying on all folders having started by the time the API has started is not ideal - I wasn’t even aware this was by design and just didn’t (intend to) change it because I tend to not change things when I have a choice E.g. as metadata recalculation can take a long time, it would be totally sensible improvement to add a “Starting” folder state that applies then - I would have made that change without a second thought towards this “time of API start vs all folder up and running”-problem. I’d utilize the event api or handle “folder not running”, as the same can also happen when unpausing a folder (in which case it’s expected to be running too but might not be for a long time).
This behaviour was in flux a few years ago (and the consequences of accessing the API “too early” were a lot worse than a 404, although I don’t recall exactly what they were), but after discussion it settled down, and the API only started once Syncthing was ready to be accessed over the API.
rest/db/statusdoes return the state of the folder, in the state field. It’s absolutely fine for this to contain a state which indicates that the folder isn’t running. What’s not expected (from a consumer point of view) is for this endpoint to return a 404 (which indicates either “API endpoint doesn’t exist” or “folder does not exist”, depending on how generous you’re feeling) at points that are unpredictable to the consumer.
Note that the events API doesn’t have an event for “folder not running” either.
As I said in the OP, I can interpret a 404 on this endpoint as “folder not running”, or I can parse the body text, if that’s what the API requires, but both options smell a lot. I wanted to make sure that this was the designed API before implementing this, rather than my implementing something unnecessarily fragile.
Note that 404 indicates that there is a problem with the request, i.e the consumer did something wrong, by requesting an endpoint which doesn’t exist. 5xx errors are server-side errors.
Thanks again for all the info. The change in behaviour in v1.3.3 that API starts before all folders are up and running was not intended and I see now why it happens - I’ll fix that.
The state in rest/db/status is confusing indeed: That’s about the state of the running folder, i.e. whether its idle, pulling, scanning, errored, … A paused folder is not running, i.e. doesn’t have such a state. I think whether the folder is paused or not is currently only visible in the config. Maybe that should be augmented.
General question: If we are to return a paused state, no information on almost all of the other fields will be present (no fileset access). Can we expect consumers to be able to deal with missing entries or do we need to add them (with null zero/default values), aka would Synctrayzor be able to handle that (without changes)?
SyncTrayzor will handle them, because it deserializes into a strongly-typed model and the deserializer will substitute a suitable default if something’s missing. I can’t say for any other API clients. I think providing a default yourself is safer.
Just to clarify, using an actual null would be a break for the fields which are e.g. integers, as it’s a change of data type. Using e.g. 0 for globalBytes would be preferable IMO.
Your welcome, so is your feedback and generally all your work on the amazing Synctrayzor!
Just to confirm: You are generally fine with the API starting before all folders are up and running (barring any other misbehaving endpoints of course, which would need fixing too)? Then I’ll leave that part as it is.
I’m still experiencing this issue on one of two machines. I set both to include the RC version but one machine still gives me this notification on start up more often than not. Everything else seems to work fine. Should I expect any difference with 1.40?