OpenBSD test debugging

And found the cause too - again no “real bug”, “just” a racy test. However it pointed me to a potential improvement in the real code :slight_smile:

Now I need to model this into nicely palatable PRs :expressionless:

Great! Is it all in your branch? I can test it for you later if you like.

Yep all there.

Alright then! I ran your branch on a65f73df9 and the only failures I have are:

TestIssue5063
TestRequestRemoteRenameChanged

Sadly TestRequestRemoteRenameChanged is back.

The log file for running all tests is massive, so I’ll now run those two tests in a loop and upload small log files.

Here’s a log file containing many failures for TestRequestRemoteRenameChanged: http://theunixzoo.co.uk/random/TestRequestRemoteRenameChanged.out.bz2

I ran it 11 times in a loop and it failed in all cases.

As for TestIssue5063, this one is much harder to trigger in isolation, but here’s a section of a full test suite run showing that test failing: http://theunixzoo.co.uk/random/TestIssue5063.out.bz2

I’ve got model and fs logging on for both logs.

Embarrassingly I was too messy on my debugging branch and when merging in master, I committed silent merge conflicts. I fixed that, which should take care of TestRequestRemoteRenameChanged.

The log of 5063 doesn’t show any filesystem activity interfering with dir removal, but it shows that filesystem watchers are started. That test works with auto-added folders, where those are enabled by default. So it’s not possible to disable that in tests. If it only happens very seldom, I’d just ignore it.

Out of 100 runs on commit 4d986830c, we have (times-failed, test-name):

  10 TestAutoAcceptNewFolderPremutationsNoPanic
   1 TestIssue5063

That test works with auto-added folders, where those are enabled by default. So it’s not possible to disable that in tests. If it only happens very seldom, I’d just ignore it.

Or is there a different way to test that functionality that would sidestep that problem? At the risk of sounding like a perfectionist, I don’t see the point of a test which sometimes fails when things are actually OK.

Cheers :slight_smile:

I was going to post the log, but I forgot to set STTRACE.

I’ll run again.

Interesting that both tests relate to new folder addition via cluster config message.

Given that the remaining two test failures relate to filesystem and have watcher enabled, and other failures of the same kind were fixed by disabling the watcher, I am inclined to believe that this is the source of the failure. Even though I can’t imagine how.

That might actually be a reasonable point: Tests that are run in temporary directories don’t really need to fail if their cleanup fails. We did that because failed cleanup can mess with other tests if testing happens in the same place. And it’s annoying if random directories exist somewhere. However I tend to agree that flaky tests are more annoying.

Edit: Implicit in my statement: I don’t think there is another way of testing the same thing short of implementing https://github.com/syncthing/syncthing/issues/4224

Well, they are only flaky because of OpenBSD particularities with file watches, so be as perfectionist as you want, the problem is somewhere else.

I say we disable these on openbsd and call it a day. OR, have a global variable/hook DefaultFolderWatchesEnabled which can be disabled during tests.

To be honest, the premutations test is stupid and we should just delete it. I wrote it when trying to repro and issue, not sure why I even comitted it.

@imsodin, didn’t you say you had managed to repro one of the bugs we’ve covered in this thread on linux? Isn’t it just that OpenBSD is more likely to expose the non-determinism of the tests?

Anyway, I’ll leave that decision up to the syncthing devs. I’m not qualified :slight_smile:

As promised, here’s a re-run of @imsodin’s branch (4d98683) with logs (100 runs with STTRACE=“model fs”).

   5 TestAutoAcceptNewFolderPremutationsNoPanic
   1 TestIssue5063

http://theunixzoo.co.uk/random/STLOG-20190206.bz2

I’d have to look at the logs, but phone does not support bz :frowning:

It’s a massive log file.

Let me extract the relevant parts:

By the way. I’m not sure if this is related, but when I run stcli folders list, before each folder entry, there’s a line like this:

13:52:28 INFO: bug: uncached filesystem call (should only happen in tests)

I was able to repro one of the bugs that was not a directory not empty one (RemoteRenameChanged I think) - that was a legitimately racy test that is fixed now. There has never been a “removeall: directory not empty” type failure outside of openbsd. And as I stated multiple times, the failure originates when cleaning up the test, i.e. the actually tested scenario has already finished successfully. As both test cases aren’t integral (in one case Audrius calls it useless, in the other case nobody know what exactly the failure mechanism is) I think disabling on openbsd is an appropriate response (6% failure rate is annoying).

I do know what the other test is testing, and it should pass on openbsd and all platforms as it’s testing for a race updating and corrupting the config.

The useless test is useless, but I don’t think it has a reason to fail.

Uh sorry about that :wink:

It doesn’t “really” fail. I have looked at tons of logs to now, and it’s always the removal of the directory, that was the folder root in the test, that fails - nothing else. The same thing happened in other tests before I disable filesystem watcher in tests. I can’t disable filesystem watcher for auto-accepted folders, so these tests still fail on openbsd.