Skip to content

Conversation

@cheddar
Copy link
Contributor

@cheddar cheddar commented Jul 29, 2014

In local development, I noticed that tide-whisperer was hanging on shutdown. I killed it again and got the following panic:

panic: runtime error: send on closed channel

goroutine 34 [running]:
runtime.panic(0x34ee00, 0x56e89e)
    /usr/local/go/src/pkg/runtime/panic.c:279 +0xf5
github.com/tidepool-org/go-common.(*Server).Close(0xc2080018b0, 0x0, 0x0)
    /Users/cheddar/github/tidepool-org/tide-whisperer/src/github.com/tidepool-org/go-common/server.go:25 +0x75
main.func·006()
    /Users/cheddar/github/tidepool-org/tide-whisperer/tide-whisperer.go:178 +0x1a8
created by main.main
    /Users/cheddar/github/tidepool-org/tide-whisperer/tide-whisperer.go:182 +0x10f7

goroutine 16 [chan send, 14546 minutes]:
github.com/tidepool-org/go-common/clients/hakken.(*coordinatorManager).Close(0xc20807c108, 0x0, 0x0)
    /Users/cheddar/github/tidepool-org/tide-whisperer/src/github.com/tidepool-org/go-common/clients/hakken/coordinator.go:137 +0xfe
github.com/tidepool-org/go-common/clients/hakken.(*HakkenClient).Close(0xc20807c0e0, 0x0, 0x0)
    /Users/cheddar/github/tidepool-org/tide-whisperer/src/github.com/tidepool-org/go-common/clients/hakken/hakken.go:99 +0x45
main.main()
    /Users/cheddar/github/tidepool-org/tide-whisperer/tide-whisperer.go:185 +0x1120

goroutine 19 [finalizer wait, 14549 minutes]:
runtime.park(0x175e0, 0x573190, 0x571c89)
    /usr/local/go/src/pkg/runtime/proc.c:1369 +0x89
runtime.parkunlock(0x573190, 0x571c89)
    /usr/local/go/src/pkg/runtime/proc.c:1385 +0x3b
runfinq()
    /usr/local/go/src/pkg/runtime/mgc0.c:2644 +0xcf
runtime.goexit()
    /usr/local/go/src/pkg/runtime/proc.c:1445

goroutine 20 [syscall]:
os/signal.loop()
    /usr/local/go/src/pkg/os/signal/signal_unix.go:21 +0x1e
created by os/signal.init·1
    /usr/local/go/src/pkg/os/signal/signal_unix.go:27 +0x32

goroutine 25 [select]:
github.com/tidepool-org/go-common/clients/shoreline.func·001()
    /Users/cheddar/github/tidepool-org/tide-whisperer/src/github.com/tidepool-org/go-common/clients/shoreline/shoreline.go:127 +0x20f
created by github.com/tidepool-org/go-common/clients/shoreline.(*ShorelineClient).Start
    /Users/cheddar/github/tidepool-org/tide-whisperer/src/github.com/tidepool-org/go-common/clients/shoreline/shoreline.go:137 +0xaa

goroutine 17 [syscall, 14546 minutes]:
runtime.goexit()
    /usr/local/go/src/pkg/runtime/proc.c:1445

goroutine 33 [semacquire, 14546 minutes]:
sync.runtime_Semacquire(0xc20807c18c)
    /private/var/folders/00/0sdwh000h01000cxqpysvccm0035qk/T/makerelease530016500/go/src/pkg/runtime/sema.goc:199 +0x30
sync.(*Mutex).Lock(0xc20807c188)
    /usr/local/go/src/pkg/sync/mutex.go:66 +0xd6
github.com/tidepool-org/go-common/clients/hakken.(*coordinatorManager).getClients(0xc20807c108, 0x0, 0x0, 0x0)
    /Users/cheddar/github/tidepool-org/tide-whisperer/src/github.com/tidepool-org/go-common/clients/hakken/coordinator.go:59 +0x55
github.com/tidepool-org/go-common/clients/hakken.func·003()
    /Users/cheddar/github/tidepool-org/tide-whisperer/src/github.com/tidepool-org/go-common/clients/hakken/hakken.go:161 +0xf6
created by github.com/tidepool-org/go-common/clients/hakken.(*HakkenClient).Publish
    /Users/cheddar/github/tidepool-org/tide-whisperer/src/github.com/tidepool-org/go-common/clients/hakken/hakken.go:167 +0x12b

Which seems to indicate that there was a deadlock between coordinatorManager.Close() and coordinatorManager.getClients().

It appears that coordinatorManager.Close() was waiting on a receive on the close channel, but nothing was on the other end of it. So this PR introduces a timeout to that wait and returns an error if it wasn't able to successfully close.

@cheddar
Copy link
Contributor Author

cheddar commented Jul 29, 2014

@kentquirk @jh-bate Pls look at this.

@jh-bate
Copy link
Contributor

jh-bate commented Jul 29, 2014

LGTM - tests even :)

cheddar added a commit that referenced this pull request Jul 29, 2014
Ensure coordinatorManager.Close() will eventually complete
@cheddar cheddar merged commit 54070dd into master Jul 29, 2014
@cheddar cheddar deleted the cheddar/close-hakken branch July 29, 2014 21:53
ewollesen added a commit that referenced this pull request Mar 27, 2024
I believe there was confusion about how this is called or should run. These
changes provide equivalent functionality, while being easier to read and
understand.

There are instructions from the library authors here:
https://github.com/IBM/sarama/blob/v1.38.1/consumer_group.go#L19

I think there was confusion around the steps detailed in the link above. So
let me clarify my understanding:

- Canceling the context passed to Consume is how the implementor of the
  sarama.ConsumerGroupHandler tells Consume to return. In other words, when
  we're shutting down, we cancel the context to get Consume to return. I base
  this off of #4 in the link.

- No goroutine needs to be launched on our side, the sarama library will
  launch the goroutines and those goroutines will call our implementation of
  ConsumeClaim(). This comes from #3.

- The warnings about thread-safety apply to our implementation of
  ConsumeClaim(). In other words, we need to be careful that s.consumer isn't
  nil, but since nothing in our code modifies s.consumer after instantiation,
  we're fine. This comes from #3.

- Since our own events.EventConsumer has to be able to respond to Stop()
  calls, we need to be able to break the for-loop in Start(), which is done by
  canceling the context that we pass to Consume(). The context.CancelFunc is
  guarded with a mutex for for extra thread-safety, so that if Stop() is called
  from multiple goroutines, we don't have any race conditions to worry about.

We can reason that the goroutine being previously launched necessary, as there
was only a single goroutine being launched, and same function that launched
it then blocked, waiting for it to return before continuing.

Without the goroutine, the WaitGroup and stop channel and stopOnce are no
longer required.
ewollesen added a commit that referenced this pull request Mar 28, 2024
I believe there was confusion about how this is called or should run. These
changes provide equivalent functionality, while being easier to read and
understand.

There are instructions from the library authors here:
https://github.com/IBM/sarama/blob/v1.38.1/consumer_group.go#L19

I think there was confusion around the steps detailed in the link above. So
let me clarify my understanding:

- Canceling the context passed to Consume is how the implementor of the
  sarama.ConsumerGroupHandler tells Consume to return. In other words, when
  we're shutting down, we cancel the context to get Consume to return. I base
  this off of #4 in the link.

- No goroutine needs to be launched on our side, the sarama library will
  launch the goroutines and those goroutines will call our implementation of
  ConsumeClaim(). This comes from #3.

- The warnings about thread-safety apply to our implementation of
  ConsumeClaim(). In other words, we need to be careful that s.consumer isn't
  nil, but since nothing in our code modifies s.consumer after instantiation,
  we're fine. This comes from #3.

- Since our own events.EventConsumer has to be able to respond to Stop()
  calls, we need to be able to break the for-loop in Start(), which is done by
  canceling the context that we pass to Consume(). The context.CancelFunc is
  guarded with a mutex for for extra thread-safety, so that if Stop() is called
  from multiple goroutines, we don't have any race conditions to worry about.

We can reason that the goroutine being previously launched was not necessary,
because there was only a single goroutine being launched, and the same
function that launched the goroutine then blocked, waiting for it to exit
before continuing.

Without the goroutine, the WaitGroup, stop channel and stopOnce are no longer
necessary.
ewollesen added a commit that referenced this pull request Apr 15, 2024
I believe there was confusion about how this is called or should run. These
changes provide equivalent functionality, while being easier to read and
understand.

There are instructions from the library authors here:
https://github.com/IBM/sarama/blob/v1.38.1/consumer_group.go#L19

I think there was confusion around the steps detailed in the link above. So
let me clarify my understanding:

- Canceling the context passed to Consume is how the implementor of the
  sarama.ConsumerGroupHandler tells Consume to return. In other words, when
  we're shutting down, we cancel the context to get Consume to return. I base
  this off of #4 in the link.

- No goroutine needs to be launched on our side, the sarama library will
  launch the goroutines and those goroutines will call our implementation of
  ConsumeClaim(). This comes from #3.

- The warnings about thread-safety apply to our implementation of
  ConsumeClaim(). In other words, we need to be careful that s.consumer isn't
  nil, but since nothing in our code modifies s.consumer after instantiation,
  we're fine. This comes from #3.

- Since our own events.EventConsumer has to be able to respond to Stop()
  calls, we need to be able to break the for-loop in Start(), which is done by
  canceling the context that we pass to Consume(). The context.CancelFunc is
  guarded with a mutex for for extra thread-safety, so that if Stop() is called
  from multiple goroutines, we don't have any race conditions to worry about.

We can reason that the goroutine being previously launched was not necessary,
because there was only a single goroutine being launched, and the same
function that launched the goroutine then blocked, waiting for it to exit
before continuing.

Without the goroutine, the WaitGroup, stop channel and stopOnce are no longer
necessary.
ewollesen added a commit that referenced this pull request Feb 3, 2025
I believe there was confusion about how this is called or should run. These
changes provide equivalent functionality, while being easier to read and
understand.

There are instructions from the library authors here:
https://github.com/IBM/sarama/blob/v1.38.1/consumer_group.go#L19

I think there was confusion around the steps detailed in the link above. So
let me clarify my understanding:

- Canceling the context passed to Consume is how the implementor of the
  sarama.ConsumerGroupHandler tells Consume to return. In other words, when
  we're shutting down, we cancel the context to get Consume to return. I base
  this off of #4 in the link.

- No goroutine needs to be launched on our side, the sarama library will
  launch the goroutines and those goroutines will call our implementation of
  ConsumeClaim(). This comes from #3.

- The warnings about thread-safety apply to our implementation of
  ConsumeClaim(). In other words, we need to be careful that s.consumer isn't
  nil, but since nothing in our code modifies s.consumer after instantiation,
  we're fine. This comes from #3.

- Since our own events.EventConsumer has to be able to respond to Stop()
  calls, we need to be able to break the for-loop in Start(), which is done by
  canceling the context that we pass to Consume(). The context.CancelFunc is
  guarded with a mutex for for extra thread-safety, so that if Stop() is called
  from multiple goroutines, we don't have any race conditions to worry about.

We can reason that the goroutine being previously launched was not necessary,
because there was only a single goroutine being launched, and the same
function that launched the goroutine then blocked, waiting for it to exit
before continuing.

Without the goroutine, the WaitGroup, stop channel and stopOnce are no longer
necessary.
ewollesen added a commit that referenced this pull request Mar 3, 2025
I believe there was confusion about how this is called or should run. These
changes provide equivalent functionality, while being easier to read and
understand.

There are instructions from the library authors here:
https://github.com/IBM/sarama/blob/v1.38.1/consumer_group.go#L19

I think there was confusion around the steps detailed in the link above. So
let me clarify my understanding:

- Canceling the context passed to Consume is how the implementor of the
  sarama.ConsumerGroupHandler tells Consume to return. In other words, when
  we're shutting down, we cancel the context to get Consume to return. I base
  this off of #4 in the link.

- No goroutine needs to be launched on our side, the sarama library will
  launch the goroutines and those goroutines will call our implementation of
  ConsumeClaim(). This comes from #3.

- The warnings about thread-safety apply to our implementation of
  ConsumeClaim(). In other words, we need to be careful that s.consumer isn't
  nil, but since nothing in our code modifies s.consumer after instantiation,
  we're fine. This comes from #3.

- Since our own events.EventConsumer has to be able to respond to Stop()
  calls, we need to be able to break the for-loop in Start(), which is done by
  canceling the context that we pass to Consume(). The context.CancelFunc is
  guarded with a mutex for for extra thread-safety, so that if Stop() is called
  from multiple goroutines, we don't have any race conditions to worry about.

We can reason that the goroutine being previously launched was not necessary,
because there was only a single goroutine being launched, and the same
function that launched the goroutine then blocked, waiting for it to exit
before continuing.

Without the goroutine, the WaitGroup, stop channel and stopOnce are no longer
necessary.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants