I’m developing a system that controls syncthing via the REST interface in java.
I currently have 3 servers in my test environment:
syncmaster ( all folders on here are set to readonly )
sync1 ( only connected to master )
sync2 ( only connected to master )
I have the logic down to sync files to the master then sync to the slave machines however i am now building the “add folder” part and am experiencing weird behaviour.
My current logic is:
1) make sure all syncthings are connected to the master
2) get config on each node
3) add new folder to each config
4) post this config back to each node
now i’m guessing this is a timing thing somewhere as sometimes this works completely fine. other times either sync1 or sync2 rest interface will stop responding indefinitely however the web interface will still load just everything stays in “unknown” state.
I look at the pid stack and its sitting on futex_wait:
I have tried versions:
0.12.20
0.12.22
0.12.23
0.13.0-beta4 ( latest master as of today )
I enabled the STRACE=all but it doesn’t give me much information as when it happens i see it disconnects from the master then no more log entries happen.
What can i do to debug this and figure out why this is happening?
Is there a set procedure i have to follow to add a folder, ie do i need to add it on each node and wait for it to connect again?
Are you restarting Syncthing after you change the config? If not, changes might not be applied. Otherwise, it’s hard to say without seeing the code.
You can also check out the code from syncthing-android here. You won’t be able to use it directly because it needs some Android classes, but it should be pretty easy to rewrite for plain Java.
Sounds like a bug. If you can reproduce it, hit the “innermost” Syncthing process with a SIGQUIT to produce a full stack trace that we can use to diagnose it. That process has the higher PID of two. That is:
$ ps aux | grep syncthing
# get either one PID, or two. If two, take the highest
$ kill -QUIT $pidOfSyncthing
You want 31f6418, current master. I think you’re suffering from the thing fixed yesterday. At least it looks similar from the trace, and the line number don’t line up with current.
Edit: Sorry, I looked wrongly. Your commit looks fine. But do ensure you are actually running what you’re building, because I don’t think you are?
For what it’s worth, the deadlock is in calling /rest/system/connections in parallel with posting the config. Not doing that, and not having the GUI up, should work around the deadlock.