OpenBSD test debugging

imsodin · January 31, 2019, 2:42pm

This is a continuation of the various debugging and fixing of tests with @vext01 that fail on OpenBSD.

Please just post all logs from failing tests with https://github.com/imsodin/syncthing/commit/8edd19b05a2a1dc6e5d1d0c8b3b98c331976217e

vext01 · January 31, 2019, 2:48pm

Hi Simon,

Thanks again for looking in to these failures!

Is there a command that I can use to run all tests but have the verbose output from setting STTRACE?

Thanks

imsodin · January 31, 2019, 2:58pm

I think if anything fails, the output is printed. To force it, you can run tests “manually” with -v (potentially without -race if it’s not supported on openbsd):

go test -v -race -tags purego -timeout 120s github.com/syncthing/syncthing/lib/... github.com/syncthing/syncthing/cmd/...

Or as failing tests are probably in lib/model, just do go test -v -race -tags purego -timeout 120s . in lib/model.

vext01 · January 31, 2019, 3:28pm

OK, so I ran:

 for i in {1..10}; do go test -v -tags purego -timeout 120s github.com/syncthing/syncthing/lib/... github.com/syncthing/syncthing/cmd/...; done  2>&1 | tee LOG

And here’a a paste of the results:

gist.github.com

https://gist.github.com/vext01/ed3b2927c418d51dcf0054d35a72e3ca

gistfile1.txt

=== RUN   TestAssets
--- PASS: TestAssets (0.01s)
PASS
ok  	github.com/syncthing/syncthing/lib/auto	(cached)
=== RUN   TestBroadcastAddr
--- PASS: TestBroadcastAddr (0.00s)
PASS
ok  	github.com/syncthing/syncthing/lib/beacon	(cached)
=== RUN   TestReplaceCommit
--- SKIP: TestReplaceCommit (0.00s)

This file has been truncated. show original

imsodin · January 31, 2019, 3:47pm

There is only TestInternalScan which is still failing and that’s the test that I forgot to “port”. With this additional change that should be fixed too: https://github.com/imsodin/syncthing/commit/2e320cc1323c0520bb1417cc71cb96430d13eb0b

(All changes are in the https://github.com/imsodin/syncthing/tree/modelTestRef branch).

vext01 · January 31, 2019, 4:09pm

I applied your second diff. Did you build test it?

FAIL github.com/syncthing/syncthing/lib/model [build failed]

It doesn’t give us any more than that.

imsodin · January 31, 2019, 4:12pm

Apparently I did not, sorry. Updated (and tested): forgot TestInternalScan · imsodin/syncthing@2e320cc · GitHub

vext01 · January 31, 2019, 10:03pm

No worries!

So, now I’ve tested on your branch imsodin/syncthing at commit 2e320cc1323c0520bb1417cc71cb96430d13eb0b.

for i in {1..100}; do go test -v -tags purego -timeout 120s github.com/syncthing/syncthing/lib/... github.com/syncthing/syncthing/cmd/...; done  2>&1 | tee LOG

Here’s the result: http://theunixzoo.co.uk/random/LOG-imsodin-syncthing-20190131.bz2

In summary, we have these failures:

$ grep -- '--- FAIL' LOG | sort | awk '{print $3}' | uniq 
TestAutoAcceptNewFolderPremutationsNoPanic
TestIssue4475
TestPullInvalidIgnoredSR
TestRequestRemoteRenameChanged

I guess that’s not what you wanted to hear :\

I wish we could reproduce this on a linux machine…

imsodin · January 31, 2019, 11:26pm

The 4475 is another test I didn’t catch yet. I found and fixed some other minor stuff that could blow up and pushed the changes again: https://github.com/syncthing/syncthing/commit/66d765c8e858766993380afadadc4554fa51a1ea

I don’t expect any other findings except that hopefully 4475 will now not fail anymore. The rest of the failures are mysterious to me. Maybe one thing: Can you also reproduce the failures if you run just one of those tests in a loop?

vext01 · February 1, 2019, 11:55am

On your branch plus that one commit (which applied with skew), the following tests can fail:

TestAutoAcceptMultipleFolders
TestAutoAcceptNewAndExistingFolder
TestAutoAcceptNewFolderPremutationsNoPanic
TestIssue4475
TestPullInvalidIgnoredSO
TestPullInvalidIgnoredSR
TestRequestDeleteChanged

vext01 · February 1, 2019, 11:59am

Here’s the log: http://theunixzoo.co.uk/random/LOG-20190201.bz2

calmh · February 1, 2019, 12:24pm

Not to be damper on you guys if this is all for fun and personal betterment, but keep in mind you’re hunting corner cases for the 0.1 % here. I’d have made the notify tests //+build !openbsd once and for all and called it best effort.

If you’re doing fixes in the code to actually make things work that’s cool. If you’re just working around OS differences for the test’s sake, I don’t know…

vext01 · February 1, 2019, 12:35pm

I thought we had skipped the notify tests, but maybe I’m wrong.

Assuming these are not inotify related, it’d be nice if the tests passed.

The reason I’ve been so persistent is that I maintain the Syncthing package for OpenBSD and I feel that I owe it to our users to check nothing bad will happen to their data. And since I don’t know how Syncthing works inside, the tests are all I have.

I understand that it’s frustrating for you guys since OpenBSD isn’t all that widely used compared to say, Linux.

calmh · February 1, 2019, 12:43pm

Ah, fair enough, I didn’t actually look at the test list. And also interesting, at first glance those should not be OS dependent. I’ll see what the diff looks like in the end.

I fully sympathize. The impression I got from the notify tests is that notify just didn’t work like we expected on OpenBSD and we made the tests more lenient as a result. That still means notify doesn’t work as we expect though, so it just means we miss notifications without the tests failing. That is, no positive change for users but the problem, such as it is, is swept under the rug. @imsodin will correct me if my impression is incorrect.

vext01 · February 4, 2019, 2:53pm

@imsodin was the last log of any use at all?

imsodin · February 4, 2019, 6:16pm

Jakob found me out

Actually this entire thing gives me a pretense to do some changes to our model tests that I wanted to do many times before, but never did, because they are tangential. Most of your test failures do not signify any dysfunction in Syncthing, just the tests. And none of them are reproducible. So I think it’s worthwhile to fix it, but no reason to be worried for users.

I just had other stuff going on. The log was partly useful. Most of the directory not empty failures are completely mysterious to me now. Some missed a m.Stop() call and I fixed some other stuff, that might be related. If you want you can have a go again at my branch (same one again): GitHub - imsodin/syncthing at modelTestRef

However I probably won’t dig deeper regarding the directory not empty stuff (I am tempted to hypothesize this isn’t even “our problem”, i.e. something related to golang on openbsd). And anyway I should probably focus on PRs at some point, that’s not going to be so easy to sell

vext01 · February 4, 2019, 9:24pm

Actually this entire thing gives me a pretense to do some changes to our model tests that I wanted to do many times before

Cool. I’d be happy to test.

Most of your test failures do not signify any dysfunction in Syncthing

The problem is, I (and other packagers probably too) don’t have the insider knowledge to distinguish a boring test suite bug from a real bug. It could be a racy clean up, or something which would delete user data if ignored.

I ran your branch on constant testing for about an hour and a half. Here’s what now fails:

TestAutoAcceptNewFolderPremutationsNoPanic
TestIssue5063
TestRequestRemoteRenameChanged

I probably won’t dig deeper regarding the directory not empty stuff

That’s a shame. It looks like this is the source of the failures.

Here’s the log: http://theunixzoo.co.uk/random/ST-LOG-20190204.bz2

I am tempted to hypothesize this isn’t even “our problem”, i.e. something related to golang on openbsd)

Well, if that’s the case, then we have the power to fix go on OpenBSD with a local patch which we could then upstream, but we’d need to figure out what exactly is wrong.

Didn’t you say in an earlier PR that something was returning an error code that should be impossible? Do you still believe this to be the case?

Cheers, and sorry again for the persistence (I got another email from a user today asking for syncthing to be updated in the pacakages collection).

imsodin · February 5, 2019, 11:20am

Understandable, there could be a notice on “meta test failures”. The implicit notice usually is, that real test failures have a meaningful message (however I won’t vouch for that ).

I pushed one more change which might have fixed 5063. The auto accept one is a directory not empty failure (more on that below). And RemoteRenameChanged is “real”, I am trying to reproduce with running in a loop locally, no luck so far.

Well, actually you do the brunt of the work, so why not. You just have to be fine with me not giving any promises.

To get more info on what’s going on with the directory not empty failures, I need logs with fs debug facility enabled. For the RequestRemoteRenameChanged with model (for good measure, just always enable model). To reduce test times and log sizes I’d do the test loops similar to this to only run the relevant test and only save the latest output: i=0; while [ $i -lt 1000 ]; do i=$((i+1)); echo "Try $i"; STTRACE=model go test -short -race -tags purego -run TestRequestRemoteRenameChanged 2>&1 > test.out || break; done

I don’t remember what exactly I said, but I am pretty certain I used should for a reason xD
Probably it was about RemoveAll returning directory not empty. At some point I thought golangs os implementation was “atomic”, but then I found a bug report that said that it is expected and possible that another goroutine can interfere with the execution of RemoveAll and thus cause this error.

No worries about the persistence.

Are you delaying updating Syncthing due to this? If so really just don’t please. There are known bugs fixed, while our test coverage isn’t nearly 100%. And even if it was, they might not cover all possible scenarios where something can in some way go wrong. If you want it more concrete: Out of all the failures discussed here, only RequestRemoteR… might still be a real bug (I doubt it) - all the rest are not. On the other hand Incorrectly out of sync, sequence number oddness. · Issue #5340 · syncthing/syncthing · GitHub is certainly a problem, and it’s fixed in v1.0.1 and existed for a long time.

imsodin · February 5, 2019, 11:53am

@vext01 Don’t focus on TestRequestRemoteRenameChanged, I was able to reproduce that after 487 tries

vext01 · February 5, 2019, 12:00pm

I was yeah, but with the above blessing I can probably put it in

Don’t focus on TestRequestRemoteRenameChanged , I was able to reproduce that after 487 tries

Great news!

To get more info on what’s going on with the directory not empty failures, I need logs with fs debug facility enabled.

Ack, watch this space.