Reason for panic maybe too little RAM?

maelcum · August 11, 2017, 7:40pm

Hi,

I’m sorry this will be a long post. I try to include as much information as possible, some probably unnecessary. Bear with me, please.

I’m a long time user of Syncthing. Since the very beginning a (small) QNAP server is one of the clients hosting Synching and albeit not being a very powerful machine has performed admirably. I’ve added more and more folders, and although it sometimes takes hours (and in two cases: days) to build the index, once it is done, everything else runs smoothly. Until this week.

I’ve been bitten by the 14.35-problem (of folders paths being resettet because they don’t have a label) and have tried to recover with 14.36. Unfortunately things just don’t work out. The current problem is, that the index can be built, but as soon as I share that folder with another machine (just one is enough) the ram consumption climbs and climbs, the machine stalls, Syncthing GUI is not responsive anymore and the console log shows messages along these lines:

    2017/08/11 21:11:14 http: panic serving [<IPv6 address>]:53867: runtime error: invalid memory address or nil pointer dereference
goroutine 841 [running]:
net/http.(*conn).serve.func1(0x114e0f60)
    /usr/local/go/src/net/http/server.go:1721 +0x9c
panic(0x5ac1e8, 0xad9a10)
    /usr/local/go/src/runtime/panic.go:489 +0x288
github.com/syncthing/syncthing/lib/ignore.(*Matcher).Lines(0x0, 0x0, 0x0, 0x0)
    /tmp/syncthing-e02d0de3/src/github.com/syncthing/syncthing/lib/ignore/ignore.go:228 +0x28
github.com/syncthing/syncthing/lib/model.(*Model).GetIgnores(0x10f9c370, 0x12c52f4b, 0xa, 0x13329360, 0x0, 0xd11ff701, 0xe, 0x955f709, 0xae27d0, 0x0, ...)
    /tmp/syncthing-e02d0de3/src/github.com/syncthing/syncthing/lib/model/model.go:1259 +0x11c
main.folderSummary(0xa98278, 0x10f5e000, 0xa990d8, 0x10f9c370, 0x12c52f4b, 0xa, 0x152)
    /tmp/syncthing-e02d0de3/src/github.com/syncthing/syncthing/cmd/syncthing/gui.go:675 +0xc60
main.(*apiService).getDBStatus(0x11c874a0, 0xa94ff8, 0x112eaa20, 0x13dfca00)
    /tmp/syncthing-e02d0de3/src/github.com/syncthing/syncthing/cmd/syncthing/gui.go:644 +0x64
main.(*apiService).(main.getDBStatus)-fm(0xa94ff8, 0x112eaa20, 0x13dfca00)
    /tmp/syncthing-e02d0de3/src/github.com/syncthing/syncthing/cmd/syncthing/gui.go:250 +0x34
net/http.HandlerFunc.ServeHTTP(0x11cb4268, 0xa94ff8, 0x112eaa20, 0x13dfca00)
    /usr/local/go/src/net/http/server.go:1942 +0x34
net/http.(*ServeMux).ServeHTTP(0x1193dfe0, 0xa94ff8, 0x112eaa20, 0x13dfca00)
    /usr/local/go/src/net/http/server.go:2238 +0x108
main.getPostHandler.func1(0xa94ff8, 0x112eaa20, 0x13dfca00)
    /tmp/syncthing-e02d0de3/src/github.com/syncthing/syncthing/cmd/syncthing/gui.go:427 +0x104
net/http.HandlerFunc.ServeHTTP(0x10f8e0a0, 0xa94ff8, 0x112eaa20, 0x13dfca00)
    /usr/local/go/src/net/http/server.go:1942 +0x34
main.metricsMiddleware.func1(0xa94ff8, 0x112eaa20, 0x13dfca00)

Some additional info:
The machine usually syncs about 12 folders, with a total of 124055 files, 10355 folders and ~1940G.
Since the indexing will take days, I’ve adopted the following routine:

all folders are paused
no folder is shared
all sync partners are paused as well

One folder is unpaused. I wait until the indexing is finished and the folder is Unshared. Then I unpause the appropriate sync partner and share the folder with this partner.
If that has finished, I pause the sync partner, pause the folder and continue with the next.
That way there is always just one folder and one sync partner at the max.
In the past, this has worked. This week, it fails.

When the QNAP starts indexing a folder with 591 files, 50 folders and 553GB content, it uses up to 380MB RAM. Since the small QNAP has only 512MB, it is on its limit.
While this happens, the sync partner outputs this on the console log every few minutes:

21:04:17 INFO: Established secure connection to XXXXX-YYYYY-... at 192.168.254.21:22000-192.168.254.4:43522 (tcp-server) (TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305)
21:04:17 INFO: Device XXXXX-YYYYY-... client is "syncthing v0.14.36" named "DE71364-SV-U4"
21:11:03 INFO: Connection to XXXXX-YYYYY-... closed: reading length: EOF

There is another machine which has the duplicate of this machine (but different hardware platform). While running with all folders active and 6 sync partners it consumes approx. 70MB RAM. Could the reason for my problem be too little RAM?

I’m by no means a programmer, so please excuse if that is plain wrong. I’m just guessing here. Maybe somebody with more knowledge could help me out and explain what I’m seeing here. Is there something I can do to avoid this?

Thanks for reading this far! Much appreciated.

Cheers

mael

imsodin · August 11, 2017, 8:46pm

Thanks for the thorough report. The backtrace points to this being a bug, that probably was triggered by the heavy resource usage and thus slow execution.

I am still puzzled by this, because the only way I can see this happen is an “ultra-race”:
The nil pointer deref happens when trying to lock the ignores mutex. We got there through a point in GetIgnores() at which the folder config already existed in the model. So the config exists, but its ignores does not. Meaning the call to getDBStatus must evaluate while control is in between lines 309 and 317 in model.go.
Does that make any sense?

Also in ignore.go both Patterns and Match have a weird m == nil guard at the beginning (Load, Parse, Hash, ShouldIgnore, … don’t have it). However a Matcher is created via a constructor New, so there is no chance of having a nil Matcher. Am I missing something?

EDIT: Just to be clear: This is a bug, introduced by https://github.com/syncthing/syncthing/pull/3996 - author: me *badummtisch*

EDIT2: I wanted to write the follow-up question: Did this just happen once. However you write you have this message repeatedly in your logs!? That’s weird because it makes my “ultra-race” less likely to be the cause.

maelcum · August 11, 2017, 9:31pm

Hi Simon.

a) Thanks for the equally verbose answer. I had to laugh a couple of times, as I can hardly follow the technical details, but well, I’ve asked for it, didn’t I?

b) It has happened twice. Yesterday and Today. Unfortunately the folder I’m currently at is taking about 25 hours to rebuild the index. I’ve just startet another run - which will take its time.

It doesn’t happen for all folders. Some are working fine.
One Folder has 64073 files, 5569 folders and ~362GB content and works great. Those 64thsd files are around 5MB, small by comparison.
The folder that repeatedly failes has just a few files (591) but around 1GB size.

c) From what I understand from your answer it points to something filtered by .stignore.
Funny though, there is no .stignore (anymore).
There should be one, with just one entry (#include .stglobal). There has certainly been one, since the .stglobal file which is referenced is still there (and has the correct entries). The “other sides” (sync partners) also have the .stignore. Just on this machine is has mysteriously vanished.

Could that be the reason for the race condition?
But then I would have expected the second run to work since if the trigger was a .stignore file vanishing during the sync, it would only vanish once.

Hm. I am interrupting the indexing now and will reinstate the correct .stignore. And then restart the indexing and wait… wait… wait…

d) One more question: If “a Matcher is created via a constructor New” is true, how can it ever be “nil”?

imsodin · August 11, 2017, 9:39pm

a) I should have mentioned that the technical mumbling was primarily directed at Syncthing contributors.

b) Ok, that doesn’t absolutely make it non-racy, so I kind of can hope the race I described (and can/will fix) is the source of your problem.

c) For the race I described it doesn’t matter whether there is a .stignore file or not.

d) I query a map containing matchers without checking whether there is actually anything in the map for this key - hence the nil result.

maelcum · August 11, 2017, 9:47pm

c) Okay. From what I understood from pull/3996 you (have introduced to) read the .stignore at folder creation time if it exists. I’m always providing a .stignore and a .stglobal before adding a folder to Syncthing. Is that a bad idea?

Grüße in die Schweiz.

imsodin · August 11, 2017, 9:53pm

No not at all, that’s fine (I do the same). In general you can expect a fix soon.

Grüsse zurück nach Deutschland oder Österreich (oder Südtirol oder Elsass oder … )

AudriusButkevicius · August 11, 2017, 9:54pm

This is genuinely a race, as you are accessing two model fields that are protected by the same lock in two separate locking sessions. The race is more prominent on slow machines I guess.

imsodin · August 11, 2017, 10:53pm

It feels slightly spammy

system · September 10, 2017, 10:58pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.