OpenBSD test debugging

imsodin · February 5, 2019, 2:06pm

And found the cause too - again no “real bug”, “just” a racy test. However it pointed me to a potential improvement in the real code

Now I need to model this into nicely palatable PRs

vext01 · February 5, 2019, 6:07pm

Great! Is it all in your branch? I can test it for you later if you like.

imsodin · February 5, 2019, 7:46pm

Yep all there.

vext01 · February 5, 2019, 9:43pm

Alright then! I ran your branch on a65f73df9 and the only failures I have are:

TestIssue5063
TestRequestRemoteRenameChanged

Sadly TestRequestRemoteRenameChanged is back.

The log file for running all tests is massive, so I’ll now run those two tests in a loop and upload small log files.

vext01 · February 5, 2019, 9:59pm

Here’s a log file containing many failures for TestRequestRemoteRenameChanged: http://theunixzoo.co.uk/random/TestRequestRemoteRenameChanged.out.bz2

I ran it 11 times in a loop and it failed in all cases.

As for TestIssue5063, this one is much harder to trigger in isolation, but here’s a section of a full test suite run showing that test failing: http://theunixzoo.co.uk/random/TestIssue5063.out.bz2

I’ve got model and fs logging on for both logs.

imsodin · February 5, 2019, 11:04pm

Embarrassingly I was too messy on my debugging branch and when merging in master, I committed silent merge conflicts. I fixed that, which should take care of TestRequestRemoteRenameChanged.

The log of 5063 doesn’t show any filesystem activity interfering with dir removal, but it shows that filesystem watchers are started. That test works with auto-added folders, where those are enabled by default. So it’s not possible to disable that in tests. If it only happens very seldom, I’d just ignore it.

vext01 · February 6, 2019, 6:36pm

Out of 100 runs on commit 4d986830c, we have (times-failed, test-name):

  10 TestAutoAcceptNewFolderPremutationsNoPanic
   1 TestIssue5063

That test works with auto-added folders, where those are enabled by default. So it’s not possible to disable that in tests. If it only happens very seldom, I’d just ignore it.

Or is there a different way to test that functionality that would sidestep that problem? At the risk of sounding like a perfectionist, I don’t see the point of a test which sometimes fails when things are actually OK.

Cheers

vext01 · February 6, 2019, 6:37pm

I was going to post the log, but I forgot to set STTRACE.

I’ll run again.

AudriusButkevicius · February 6, 2019, 6:52pm

Interesting that both tests relate to new folder addition via cluster config message.

imsodin · February 6, 2019, 7:28pm

Given that the remaining two test failures relate to filesystem and have watcher enabled, and other failures of the same kind were fixed by disabling the watcher, I am inclined to believe that this is the source of the failure. Even though I can’t imagine how.

imsodin · February 6, 2019, 8:15pm

That might actually be a reasonable point: Tests that are run in temporary directories don’t really need to fail if their cleanup fails. We did that because failed cleanup can mess with other tests if testing happens in the same place. And it’s annoying if random directories exist somewhere. However I tend to agree that flaky tests are more annoying.

Edit: Implicit in my statement: I don’t think there is another way of testing the same thing short of implementing Editable default values for folders, devices · Issue #4224 · syncthing/syncthing · GitHub

AudriusButkevicius · February 6, 2019, 8:30pm

Well, they are only flaky because of OpenBSD particularities with file watches, so be as perfectionist as you want, the problem is somewhere else.

I say we disable these on openbsd and call it a day. OR, have a global variable/hook DefaultFolderWatchesEnabled which can be disabled during tests.

AudriusButkevicius · February 6, 2019, 9:55pm

To be honest, the premutations test is stupid and we should just delete it. I wrote it when trying to repro and issue, not sure why I even comitted it.

vext01 · February 7, 2019, 11:07am

@imsodin, didn’t you say you had managed to repro one of the bugs we’ve covered in this thread on linux? Isn’t it just that OpenBSD is more likely to expose the non-determinism of the tests?

Anyway, I’ll leave that decision up to the syncthing devs. I’m not qualified

As promised, here’s a re-run of @imsodin’s branch (4d98683) with logs (100 runs with STTRACE=“model fs”).

   5 TestAutoAcceptNewFolderPremutationsNoPanic
   1 TestIssue5063

http://theunixzoo.co.uk/random/STLOG-20190206.bz2

AudriusButkevicius · February 7, 2019, 11:46am

I’d have to look at the logs, but phone does not support bz

vext01 · February 7, 2019, 12:20pm

It’s a massive log file.

Let me extract the relevant parts:

gist.github.com

https://gist.github.com/vext01/92d8d76b762ef054b63c29fdcae6c8b2

1

=== RUN   TestAutoAcceptNewFolderPremutationsNoPanic
2019/02/06 18:47:59.760596 progressemitter.go:197: DEBUG: progress emitter: updated interval 1s
2019/02/06 18:47:59.760646 set.go:81: INFO: No stored folder metadata for "3bKE4L7r": recalculating
2019/02/06 18:47:59.760706 logfs.go:79: DEBUG: ignore.go:320 basic /home/edd/go/src/github.com/imsodin/syncthing/lib/model/3bKE4L7r Open .stignore <nil> open /home/edd/go/src/github.com/imsodin/syncthing/lib/model/3bKE4L7r/.stignore: no such file or directory
2019/02/06 18:47:59.760747 logfs.go:115: DEBUG: folderconfiguration.go:193 basic /home/edd/go/src/github.com/imsodin/syncthing/lib/model/3bKE4L7r Stat . <nil> stat /home/edd/go/src/github.com/imsodin/syncthing/lib/model/3bKE4L7r: no such file or directory
2019/02/06 18:47:59.761015 logfs.go:73: DEBUG: folderconfiguration.go:194 basic /home/edd/go/src/github.com/imsodin/syncthing/lib/model/3bKE4L7r MkdirAll . -rwxrwxrwx <nil>
2019/02/06 18:47:59.761046 logfs.go:115: DEBUG: folderconfiguration.go:152 basic /home/edd/go/src/github.com/imsodin/syncthing/lib/model/3bKE4L7r Stat . {0xc00591ccc0} <nil>
2019/02/06 18:47:59.761064 logfs.go:115: DEBUG: folderconfiguration.go:170 basic /home/edd/go/src/github.com/imsodin/syncthing/lib/model/3bKE4L7r Stat .stfolder <nil> stat /home/edd/go/src/github.com/imsodin/syncthing/lib/model/3bKE4L7r/.stfolder: no such file or directory
2019/02/06 18:47:59.761314 logfs.go:67: DEBUG: folderconfiguration.go:136 basic /home/edd/go/src/github.com/imsodin/syncthing/lib/model/3bKE4L7r Mkdir .stfolder -rwxrwxrwx <nil>
2019/02/06 18:47:59.761339 logfs.go:79: DEBUG: folderconfiguration.go:140 basic /home/edd/go/src/github.com/imsodin/syncthing/lib/model/3bKE4L7r Open . {0xc00015c040 .} <nil>

This file has been truncated. show original

2

vext01 · February 7, 2019, 1:54pm

By the way. I’m not sure if this is related, but when I run stcli folders list, before each folder entry, there’s a line like this:

13:52:28 INFO: bug: uncached filesystem call (should only happen in tests)

imsodin · February 7, 2019, 3:24pm

I was able to repro one of the bugs that was not a directory not empty one (RemoteRenameChanged I think) - that was a legitimately racy test that is fixed now. There has never been a “removeall: directory not empty” type failure outside of openbsd. And as I stated multiple times, the failure originates when cleaning up the test, i.e. the actually tested scenario has already finished successfully. As both test cases aren’t integral (in one case Audrius calls it useless, in the other case nobody know what exactly the failure mechanism is) I think disabling on openbsd is an appropriate response (6% failure rate is annoying).

AudriusButkevicius · February 7, 2019, 3:26pm

I do know what the other test is testing, and it should pass on openbsd and all platforms as it’s testing for a race updating and corrupting the config.

The useless test is useless, but I don’t think it has a reason to fail.

imsodin · February 7, 2019, 3:30pm

Uh sorry about that

It doesn’t “really” fail. I have looked at tons of logs to now, and it’s always the removal of the directory, that was the folder root in the test, that fails - nothing else. The same thing happened in other tests before I disable filesystem watcher in tests. I can’t disable filesystem watcher for auto-accepted folders, so these tests still fail on openbsd.