Hanging panics in integration tests


(Simon) #1

I wanted to add some “assert-style” panics to the code and run integration-tests in the hope of getting something tangible regarding https://github.com/syncthing/syncthing/issues/5340. The I came across this very weird issue:

diff --git a/lib/db/meta.go b/lib/db/meta.go
index dffc3f3ed..9ed9cfd52 100644
--- a/lib/db/meta.go
+++ b/lib/db/meta.go
@@ -8,6 +8,7 @@ package db
 
 import (
        "bytes"
+       "fmt"
        "math/bits"
        "time"
 
@@ -112,6 +113,8 @@ func (m *metadataTracker) addFile(dev protocol.DeviceID, f FileIntf) {
                return
        }
 
+       fmt.Println("panic outside lock")
+       panic("panic outside lock")
        m.mut.Lock()
        m.dirty = true

Build and run the SyncCluster test results in the expected two lines and the trace afterwards:

panic outside lock
panic: panic outside lock

Inserting the panic after the lock like this

diff --git a/lib/db/meta.go b/lib/db/meta.go
index dffc3f3ed..c14cf43df 100644
--- a/lib/db/meta.go
+++ b/lib/db/meta.go
@@ -8,6 +8,7 @@ package db
 
 import (
        "bytes"
+       "fmt"
        "math/bits"
        "time"
 
@@ -114,6 +115,8 @@ func (m *metadataTracker) addFile(dev protocol.DeviceID, f FileIntf) {
 
        m.mut.Lock()
        m.dirty = true
+       fmt.Println("panic inside lock")
+       panic("panic inside lock")
 
        if flags := f.FileLocalFlags(); flags == 0 {
                // Account regular files in the zero-flags bucket.

Results in the fmt line to be printed, but no panic - the process even seems to partially carry on before hanging:

panic inside lock
panic inside lock
[I6KAH] 15:19:35 INFO: Detected 3 NAT services

I can’t find anything helpful in the trace when the test finally times out: test.out (8.1 KB)

Any ideas what the reason might be?


(Jakob Borg) #2

Are we missing something with the suture-should-not-restart-panics in that code path?

If the code path is from the HTTP server it would swallow the panic (though also print it) and we’d hang on the lock next time through. addFile though shouldn’t get called that way, afaik.


(Simon) #3

The panics works if it’s in the exact same function above the lock - so it can’t be a missed suture-passthrough-panic-thing.


(Jakob Borg) #4

I admit I have no better idea…


(Simon) #5

Pity, I was hoping this was some not so obvious consequence of an obscure language spec that you knew and not a weird problem that should be impossible and might also be specific to my system/syncthing integration tests/moon phase/…


(Jakob Borg) #6

Well, it’s not just you - I see the same thing. Panic before the lock panics as it should. Panic inside the lock is swallowed. By what, I don’t know, but it’s bad and it shouldn’t happen.


Edit: Oh, I know. addFiles is called from (s *FileSet) Update(...) and it panics somewhere in the middle. But defers run on panic, and we have a defer s.meta.toDB(...) in there which tries to acquire the same lock while unwinding from the panic. So the panic never completes.

If addFiles used a defer unlock instead (and we don’t panic precisely between the lock and the defer-unlock, which should be impossible) it works.

Guideline: prefer defer unlocks…


(Simon) #7

Thanks a lot for finding the cause!
Apparently Go’s panic isn’t panicked enough if it lets itself be restrained by something as feeble as a simple lock :stuck_out_tongue:

I found something completely puzzling concerning metadataTracker, which means it’s either broken or I am too tired already. To rule out the second I’ll go to sleep and have a look another time. I have been sleeping :slight_smile:


(Jakob Borg) #8

I think this is a double whammy of not using deferred unlock and doing stuff that takes locks in a defer. Both of those are probably suboptimal practice.