Go 1.13

calmh · September 3, 2019, 6:46am

I intend to build our 1.2.3-rc releases next Tuesday with the latest Go 1.13 RC (1.13rc2 as of right now). Maybe this will flush out some bugs ahead of the 1.13 release, and I understand Android needs to move to it sooner rather than later anyhow.

This will also gain us a couple of new architectures, illumos-amd64 and openbsd-arm64 I think, without checking too closely.

For right now this will only be for the “GitHub binaries” - not the Debian and Snap archives, for reasons. If it works well and 1.13 is released it will of course be for everything from that point onwards.

(And I’m trying to avoid the screwup from last time where we had some x.y.z-rc built with 1.11 and then the release x.y.z built with 1.12 getting new bugs…)

Catfriend1 · September 3, 2019, 1:20pm

I’ve already done the building with the 1.13rc2 and until now, I didnt encounter problems :-).

GitHub user olifre also tried the tls 1.3 stuff:

Very nice! My first time trying the more “neko” fork - and I’m very pleased! Since with this fork, Syncthing TLS 1.3 out of the box (hooray!), I temporarily tried with: GODEBUG=tls13=0 which both was accepted fine by the GUI in release 1.2.2.3 and effectively turned off TLS 1.3 as expected.

So all is well

calmh · September 3, 2019, 7:59pm

OK, make that Go 1.13 for everything then, as it’s now released.

imsodin · September 4, 2019, 7:48pm

Weirdly since deploying docker with go1.13 builds take more than 10min instead of a bit more than 2min (https://build.syncthing.net/buildConfiguration/SyncthingAndroid_Build) and deploying a release fails due OOM or something obscure in the last try: https://build.syncthing.net/buildConfiguration/Syncthing_Release_ReleaseSyncthingAndroid#all-projects . It looks like it’s looking for a few more dependencies, but not that much more to explain the time difference. Otherwise I don’t see anything out of the ordinary in the new logs. Ideas?

calmh · September 4, 2019, 8:07pm

Are you doing an in-GOPATH build? The behavior changed for that in 1.13 so now you get a modules build regardless… Or, or maybe combined with, the place where build cache is stored might have changed ($GOPATH/pkg or someplace under ~/.cache now?) and might get blown away between builds if it’s in the temporary docker image somewhere?

imsodin · September 5, 2019, 8:19am

It was already a module build before. Go cache for both 1.12 and 1.13 says it is in “a subdirectory named go-build in the standard user cache directory for the current operating system.”, which is ~/.cache/go-build on linux - no change there.

However there is a new package proxy that’s enabled by default, setting it to direct brings the build times back down to 5min (still twice than before, but less than halve than whitout that). Didn’t you setup some local proxy instance a while back? How is that configured (i.e. how does go/build server know to speak to that)?

EDIT: Actually I see that the release job has GOPROXY already set to a custom value, while the “normal build job” does not. So that might have been a red herring.

calmh · September 5, 2019, 8:23am

Hmm. There is a GOPROXY setting on the top level root project on the build server, so it should be inherited all the way down. This points to the local Athens instance so should be fast. There may be sumdb-lookups that happen now that didn’t before, I guess. I suspect ~/.cache will map to somewhere in the temporary container and might get blown away between each build, but that should have been the case before too.

imsodin · September 5, 2019, 2:32pm

So barring a mistake if the go cache was previously discarded it now shouldn’t be anymore with this change: https://github.com/syncthing/syncthing-android/commit/345452a38ce7bbb6b20aca8c077af727bde7edd0
However builds still take ages so either I have made a mistake or there’s something else that changed in go1.13 or go1.13 wasn’t the culprit at all, but deploying a new image and the issue is somewhere with docker/CI/…

I am currently at my wits end.

AudriusButkevicius · September 5, 2019, 3:04pm

Which specific part takes time?

imsodin · September 5, 2019, 3:15pm

It’s finding the packages. So it should be fixable by persisting GOROOT/pkg/mod, but I still don’t understand why it takes so much more time than with the previous build image…

imsodin · September 5, 2019, 3:50pm

That didn’t help either. But I now did notice a crucial difference between the previous fast builds and the slow ones now:

Bot cases it finds versioned dependencies very fast, e.g.

go: finding github.com/AudriusButkevicius/go-nat-pmp v0.0.0-20160522074932-452c97607362

In the fast cases it was done then. In the slow cases it does pull the latest version too, and that takes a long time:

go: finding github.com/AudriusButkevicius/go-nat-pmp latest

The only thing in the release notes that might be related is the part about version validation (in https://golang.org/doc/go1.13#modules), but I fail to understand how that would result in go pulling the latest version of the module.

calmh · September 5, 2019, 3:56pm

Running go get or something as part of the build?

imsodin · September 5, 2019, 4:07pm

Jackpot - there was a go mod download hidden somewhere in there: Back to 2min build time.

Still no clue whatsoever why this became a problem now.

imsodin · September 5, 2019, 4:33pm

Now the CI is reporting OOMs again, so that seems to be an unrelated problem: https://build.syncthing.net/viewLog.html?buildId=45875&buildTypeId=Syncthing_Release_ReleaseSyncthingAndroid&tab=buildLog&_focus=667

calmh · September 5, 2019, 5:03pm

The build agent containers are limited to 8 GiB, but I’m not sure that carries over to your other image as it’s also a Docker container “on the side”… The server itself has 128 GiB of RAM and about 100 GiB of those free. Maybe there’s some classic old -Xmx 512m or something on the JVM?

imsodin · September 6, 2019, 8:15am

That’s TC admin stuff, to which I have no access (apparently memory usage statistics is also in there). I set -Xmx explicitly in the job options and they did get picked up

Picked up JAVA_TOOL_OPTIONS:
-XX:+UnlockExperimentalVMOptions
-XX:+UseCGroupMemoryLimitForHeap
-Xmx2g

However that might be overruled by a parent restriction. The cgroup limit would be another option, but I doubt that’s so low.

It really would be interesting to see how much memory is used during such a build, i.e. whether it is more or less reasonable, so there’s some weird restriction in CI, or whether there’s really a runaway memory usage.

calmh · September 6, 2019, 8:44am

I administratorified your user, and added the perfmon thing to the android build. But I don’t know how that works or what it measures. It’s not enormously detailed:

https://build.syncthing.net/viewLog.html?buildId=45880&buildTypeId=SyncthingAndroid_Build&tab=perfmon

The build agents run as Docker containers with the (Java) build agent inside, and a local Go set up and some other stuff. Those containers are limited to 8 GiB each. But the Android build spins up another container directly on the host, which it can do as the agent container is marked as “privileged”. I don’t think that container gets any limits set (there are none in the command line Teamcity uses) and I don’t see how it could inherit them from the other container it runs beside…

Maybe two gigs for the Java stuff just isn’t enough. I mean, the build server uses about 2.5 gigs, the agents (who do essentially nothing except launching commands and piping stdout) use about 1 - 1.5 gigs each…

Catfriend1 · September 6, 2019, 9:09am

Just as a side note: I’m using a build machine with 16 gigs of RAM and allocate 4 gigs to the JVM.

AudriusButkevicius · September 8, 2019, 4:07pm

There are hints that this is due to orphaned gradle processes on the slaves. @calmh the salves seem to be unavailable on the internet, and I don’t seem to have ssh access to the build master either

calmh · September 8, 2019, 5:02pm

It’s all on the same physical box, but you don’t have an account yet as it’s new. I’ll set you up, expect PM when I’m at a computer in a short while.

I doubt there are orphaned processes as the container running gradle doesn’t survive beyond the build job.