Skip to content

Ocasional containerd crashes on CI #4154

Closed

Description

I started seeing this while migrating integration tests on Github actions (1), but then noticed that it's quite often happens on Travis as well (1, 2, 3, 4).

time="2020-04-03T20:10:14.618255755Z" level=debug msg="(*service).Write started" ref=c1-commiterror-state
2932time="2020-04-03T20:10:14.624627962Z" level=debug msg="(*service).Write started" ref=c1-commiterror-state
2933time="2020-04-03T20:10:15.182502071Z" level=error msg="(*service).Write failed" error="rpc error: code = Unavailable desc = ref ContentClient-n1/1/c1-commiterror-state locked: unavailable" ref=c1-commiterror-state
2934time="2020-04-03T20:10:15.186321757Z" level=error msg="(*service).Write failed" error="rpc error: code = FailedPrecondition desc = unexpected commit digest sha256:af9f4d33a30760b1d8a0297164ac7bcc2d1a8b1bdaf7880e2ce2fdfb96edec65, expected sha256:f7358a28d3925ed49329484588a910278afbfe778319abb24db99b793a671202: failed precondition" ref=c1-commiterror-state

...

time="2020-04-03T15:26:32.535416150Z" level=debug msg="(*service).Write started" expected="sha256:208de85e6cfa2a86b205fd3e1fa66763ad3df81588b5a1b47a0b77e78a67e3cd" ref="manifest-sha256:208de85e6cfa2a86b205fd3e1fa66763ad3df81588b5a1b47a0b77e78a67e3cd" total=528
2136 time="2020-04-03T15:26:32.535612486Z" level=debug msg="(*service).Write started" expected="sha256:05e348797cb2a37dc91ba34fffbfcc64192097c113a1fc49b0b6bf047d96c81f" ref="manifest-sha256:05e348797cb2a37dc91ba34fffbfcc64192097c113a1fc49b0b6bf047d96c81f" total=527
2137 runtime: nelems=11 nalloc=2 previous allocCount=1 nfreed=65535
2138 fatal error: sweep increased allocation count
2139
2140 runtime stack:
2141 runtime.throw(0x556b83439af9, 0x20)
2142	/home/travis/.gimme/versions/go1.13.8.linux.amd64/src/runtime/panic.go:774 +0x74
2143 runtime.(*mspan).sweep(0x7fc116738460, 0x7fc116738400, 0x556b823c5f00)
2144	/home/travis/.gimme/versions/go1.13.8.linux.amd64/src/runtime/mgcsweep.go:328 +0x8cc
2145 runtime.(*mcentral).uncacheSpan(0x556b84b21c38, 0x7fc116738460)
2146	/home/travis/.gimme/versions/go1.13.8.linux.amd64/src/runtime/mcentral.go:197 +0x7b
2147 runtime.(*mcache).releaseAll(0x7fc11676d008)
...

and lots of errors like:

--- FAIL: TestContainerPTY (0.00s)
2009    container_linux_test.go:468: failed to dial "/run/containerd-test/containerd.sock": connection error: desc = "transport: error while dialing: dial unix /run/containerd-test/containerd.sock: connect: connection refused"
2010--- FAIL: TestContainerAttach (0.00s)
2011    container_linux_test.go:545: failed to dial "/run/containerd-test/containerd.sock": connection error: desc = "transport: error while dialing: dial unix /run/containerd-test/containerd.sock: connect: connection refused"
2012--- FAIL: TestShimInCgroup (0.00s)
2013    container_linux_test.go:134: failed to dial "/run/containerd-test/containerd.sock": connection error: desc = "transport: error while dialing: dial unix /run/containerd-test/containerd.sock: connect: connection refused"
2014--- FAIL: TestTaskUpdate (0.00s)
2015    container_linux_test.go:56: failed to dial "/run/containerd-test/containerd.sock": connection error: desc = "transport: error while dialing: dial unix /run/containerd-test/containerd.sock: connect: connection refused"
...

might be related to unexpected commit digest problems: #3974

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions