Skip to content

Commit

Permalink
Fix data race between Agent.Run() and Agent.Stop() (#1625)
Browse files Browse the repository at this point in the history
* Fix data race between Agent.Run() and Agent.Stop()

We had a data race with Agent.Stop() failing if called immediately
after Agent.Run() returns.

The root cause of the race was that Stop() called WaitGroup.Wait()
before Run() called WaitGroup.Add() for the first time which is
prohibited: https://golang.org/pkg/sync/#WaitGroup.Add

This change moves WaitGroup.Add() to earlier stage which guarantees
that WaitGroup.Wait() will be called after that.

Github issue: #1624

Testing done:

Added automated tests which was failing before the bug was fixed
and does not fail any more after the fix.

Also verified that `make test` passes.

Signed-off-by: Tigran Najaryan <tigran@najaryan.net>

* Fix based on PR comments

Signed-off-by: Tigran Najaryan <tigran@najaryan.net>
  • Loading branch information
tigrannajaryan authored and pavolloffay committed Jun 26, 2019
1 parent 1ef1a6e commit 2adc99c
Show file tree
Hide file tree
Showing 2 changed files with 49 additions and 7 deletions.
41 changes: 41 additions & 0 deletions cmd/agent/app/agent_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,47 @@ func withRunningAgent(t *testing.T, testcase func(string, chan error)) {
t.Fatal("Expecting server exit log")
}

func TestStartStopRace(t *testing.T) {
resetDefaultPrometheusRegistry()
cfg := Builder{
Processors: []ProcessorConfiguration{
{
Model: jaegerModel,
Protocol: compactProtocol,
Server: ServerConfiguration{
HostPort: "127.0.0.1:0",
},
},
},
}
logger, logBuf := testutils.NewLogger()
mBldr := &jmetrics.Builder{HTTPRoute: "/metrics", Backend: "prometheus"}
mFactory, err := mBldr.CreateMetricsFactory("jaeger")
require.NoError(t, err)
agent, err := cfg.CreateAgent(fakeCollectorProxy{}, logger, mFactory)
require.NoError(t, err)

// This test attempts to hit the data race bug when Stop() is called
// immediately after Run(). We had a bug like that which is now fixed:
// https://github.com/jaegertracing/jaeger/issues/1624
// Before the bug was fixed this test was failing as expected when
// run with -race flag.

if err := agent.Run(); err != nil {
t.Errorf("error from agent.Run(): %s", err)
}

agent.Stop()

for i := 0; i < 1000; i++ {
if strings.Contains(logBuf.String(), "agent's http server exiting") {
return
}
time.Sleep(time.Millisecond)
}
t.Fatal("Expecting server exit log")
}

func resetDefaultPrometheusRegistry() {
// Create and assign a new Prometheus Registerer/Gatherer for each test
registry := prometheus.NewRegistry()
Expand Down
15 changes: 8 additions & 7 deletions cmd/agent/app/processors/thrift_processor.go
Original file line number Diff line number Diff line change
Expand Up @@ -76,16 +76,18 @@ func NewThriftProcessor(
numProcessors: numProcessors,
}
metrics.Init(&res.metrics, mFactory, nil)
res.processing.Add(res.numProcessors)
for i := 0; i < res.numProcessors; i++ {
go func() {
res.processBuffer()
res.processing.Done()
}()
}
return res, nil
}

// Serve initiates the readers and starts serving traffic
// Serve starts serving traffic
func (s *ThriftProcessor) Serve() {
s.processing.Add(s.numProcessors)
for i := 0; i < s.numProcessors; i++ {
go s.processBuffer()
}

s.server.Serve()
}

Expand Down Expand Up @@ -119,5 +121,4 @@ func (s *ThriftProcessor) processBuffer() {
s.protocolPool.Put(protocol)
s.server.DataRecd(readBuf) // acknowledge receipt and release the buffer
}
s.processing.Done()
}

0 comments on commit 2adc99c

Please sign in to comment.