Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix data race when MatchesCSR fails #3017

Merged
merged 1 commit into from
Aug 29, 2017
Merged

Fix data race when MatchesCSR fails #3017

merged 1 commit into from
Aug 29, 2017

Conversation

jsha
Copy link
Contributor

@jsha jsha commented Aug 29, 2017

I can reliably reproduce a data race by doing the following things:

  • Changing cmd/boulder-ra/main.go to use a cmd.Clock() so I can fake dates in the future
  • Editing docker-compose.yml to set a FAKECLOCK in the future
  • Editing start.py to set race_detection=True
  • docker-compose up
  • certbot_test certonly --standalone -d nuzbar.com

The race detector output is below. However, I'm confused about the output; it claims that there's a write on a line that only has a return statement on it. I poked at the code a little bit, and this diff appears to solve it: forcing a copy of cert.DER before forking off a goroutine. However, I'm confused about why this would be an issue. My understanding of how Go handles function-scoped variables is that they'll be retains as long as they need to be based on reference counting. What are your thoughts? I'd like to understand this better!

WARNING: DATA RACE
Write at 0x00c4201e05d8 by goroutine 133:
  github.com/letsencrypt/boulder/ra.(*RegistrationAuthorityImpl).NewCertificate()
      /go/src/github.com/letsencrypt/boulder/ra/ra.go:872 +0x16b2
  github.com/letsencrypt/boulder/grpc.(*RegistrationAuthorityServerWrapper).NewCertificate()
      /go/src/github.com/letsencrypt/boulder/grpc/ra-wrappers.go:243 +0x263
  github.com/letsencrypt/boulder/ra/proto._RegistrationAuthority_NewCertificate_Handler.func1()
      /go/src/github.com/letsencrypt/boulder/ra/proto/ra.pb.go:445 +0xa1
  github.com/letsencrypt/boulder/vendor/github.com/grpc-ecosystem/go-grpc-prometheus.UnaryServerInterceptor()
      /go/src/github.com/letsencrypt/boulder/vendor/github.com/grpc-ecosystem/go-grpc-prometheus/server.go:29 +0xc9
  github.com/letsencrypt/boulder/grpc.(*serverInterceptor).intercept()
      /go/src/github.com/letsencrypt/boulder/grpc/interceptors.go:23 +0x9a
  github.com/letsencrypt/boulder/grpc.(*serverInterceptor).(github.com/letsencrypt/boulder/grpc.intercept)-fm()
      /go/src/github.com/letsencrypt/boulder/grpc/server.go:51 +0x99
  github.com/letsencrypt/boulder/ra/proto._RegistrationAuthority_NewCertificate_Handler()
      /go/src/github.com/letsencrypt/boulder/ra/proto/ra.pb.go:447 +0x1fb
  github.com/letsencrypt/boulder/vendor/google.golang.org/grpc.(*Server).processUnaryRPC()
      /go/src/github.com/letsencrypt/boulder/vendor/google.golang.org/grpc/server.go:748 +0xf67
  github.com/letsencrypt/boulder/vendor/google.golang.org/grpc.(*Server).handleStream()
      /go/src/github.com/letsencrypt/boulder/vendor/google.golang.org/grpc/server.go:942 +0x122a
  github.com/letsencrypt/boulder/vendor/google.golang.org/grpc.(*Server).serveStreams.func1.1()
      /go/src/github.com/letsencrypt/boulder/vendor/google.golang.org/grpc/server.go:507 +0xb6

Previous read at 0x00c4201e05d8 by goroutine 121:
  github.com/letsencrypt/boulder/ra.(*RegistrationAuthorityImpl).NewCertificate.func2()
      /go/src/github.com/letsencrypt/boulder/ra/ra.go:856 +0x9e

Goroutine 133 (running) created at:
  github.com/letsencrypt/boulder/vendor/google.golang.org/grpc.(*Server).serveStreams.func1()
      /go/src/github.com/letsencrypt/boulder/vendor/google.golang.org/grpc/server.go:508 +0xb8
  github.com/letsencrypt/boulder/vendor/google.golang.org/grpc/transport.(*http2Server).operateHeaders()
      /go/src/github.com/letsencrypt/boulder/vendor/google.golang.org/grpc/transport/http2_server.go:306 +0x1362
  github.com/letsencrypt/boulder/vendor/google.golang.org/grpc/transport.(*http2Server).HandleStreams()
      /go/src/github.com/letsencrypt/boulder/vendor/google.golang.org/grpc/transport/http2_server.go:370 +0xacd
  github.com/letsencrypt/boulder/vendor/google.golang.org/grpc.(*Server).serveStreams()
      /go/src/github.com/letsencrypt/boulder/vendor/google.golang.org/grpc/server.go:515 +0x1e7
  github.com/letsencrypt/boulder/vendor/google.golang.org/grpc.(*Server).serveHTTP2Transport()
      /go/src/github.com/letsencrypt/boulder/vendor/google.golang.org/grpc/server.go:496 +0x628
  github.com/letsencrypt/boulder/vendor/google.golang.org/grpc.(*Server).handleRawConn()
      /go/src/github.com/letsencrypt/boulder/vendor/google.golang.org/grpc/server.go:466 +0x5f3

Goroutine 121 (running) created at:
  github.com/letsencrypt/boulder/ra.(*RegistrationAuthorityImpl).NewCertificate()
      /go/src/github.com/letsencrypt/boulder/ra/ra.go:857 +0x1a4c
  github.com/letsencrypt/boulder/grpc.(*RegistrationAuthorityServerWrapper).NewCertificate()
      /go/src/github.com/letsencrypt/boulder/grpc/ra-wrappers.go:243 +0x263
  github.com/letsencrypt/boulder/ra/proto._RegistrationAuthority_NewCertificate_Handler.func1()
      /go/src/github.com/letsencrypt/boulder/ra/proto/ra.pb.go:445 +0xa1
  github.com/letsencrypt/boulder/vendor/github.com/grpc-ecosystem/go-grpc-prometheus.UnaryServerInterceptor()
      /go/src/github.com/letsencrypt/boulder/vendor/github.com/grpc-ecosystem/go-grpc-prometheus/server.go:29 +0xc9
  github.com/letsencrypt/boulder/grpc.(*serverInterceptor).intercept()
      /go/src/github.com/letsencrypt/boulder/grpc/interceptors.go:23 +0x9a
  github.com/letsencrypt/boulder/grpc.(*serverInterceptor).(github.com/letsencrypt/boulder/grpc.intercept)-fm()
      /go/src/github.com/letsencrypt/boulder/grpc/server.go:51 +0x99
  github.com/letsencrypt/boulder/ra/proto._RegistrationAuthority_NewCertificate_Handler()
      /go/src/github.com/letsencrypt/boulder/ra/proto/ra.pb.go:447 +0x1fb
  github.com/letsencrypt/boulder/vendor/google.golang.org/grpc.(*Server).processUnaryRPC()
      /go/src/github.com/letsencrypt/boulder/vendor/google.golang.org/grpc/server.go:748 +0xf67
  github.com/letsencrypt/boulder/vendor/google.golang.org/grpc.(*Server).handleStream()
      /go/src/github.com/letsencrypt/boulder/vendor/google.golang.org/grpc/server.go:942 +0x122a
  github.com/letsencrypt/boulder/vendor/google.golang.org/grpc.(*Server).serveStreams.func1.1()
      /go/src/github.com/letsencrypt/boulder/vendor/google.golang.org/grpc/server.go:507 +0xWARNING: DATA RACE
Write at 0x00c4201e05d8 by goroutine 133:
  github.com/letsencrypt/boulder/ra.(*RegistrationAuthorityImpl).NewCertificate()
      /go/src/github.com/letsencrypt/boulder/ra/ra.go:872 +0x16b2
  github.com/letsencrypt/boulder/grpc.(*RegistrationAuthorityServerWrapper).NewCertificate()
      /go/src/github.com/letsencrypt/boulder/grpc/ra-wrappers.go:243 +0x263
  github.com/letsencrypt/boulder/ra/proto._RegistrationAuthority_NewCertificate_Handler.func1()
      /go/src/github.com/letsencrypt/boulder/ra/proto/ra.pb.go:445 +0xa1
  github.com/letsencrypt/boulder/vendor/github.com/grpc-ecosystem/go-grpc-prometheus.UnaryServerInterceptor()
      /go/src/github.com/letsencrypt/boulder/vendor/github.com/grpc-ecosystem/go-grpc-prometheus/server.go:29 +0xc9
  github.com/letsencrypt/boulder/grpc.(*serverInterceptor).intercept()
      /go/src/github.com/letsencrypt/boulder/grpc/interceptors.go:23 +0x9a
  github.com/letsencrypt/boulder/grpc.(*serverInterceptor).(github.com/letsencrypt/boulder/grpc.intercept)-fm()
      /go/src/github.com/letsencrypt/boulder/grpc/server.go:51 +0x99
  github.com/letsencrypt/boulder/ra/proto._RegistrationAuthority_NewCertificate_Handler()
      /go/src/github.com/letsencrypt/boulder/ra/proto/ra.pb.go:447 +0x1fb
  github.com/letsencrypt/boulder/vendor/google.golang.org/grpc.(*Server).processUnaryRPC()
      /go/src/github.com/letsencrypt/boulder/vendor/google.golang.org/grpc/server.go:748 +0xf67
  github.com/letsencrypt/boulder/vendor/google.golang.org/grpc.(*Server).handleStream()
      /go/src/github.com/letsencrypt/boulder/vendor/google.golang.org/grpc/server.go:942 +0x122a
  github.com/letsencrypt/boulder/vendor/google.golang.org/grpc.(*Server).serveStreams.func1.1()
      /go/src/github.com/letsencrypt/boulder/vendor/google.golang.org/grpc/server.go:507 +0xb6

Previous read at 0x00c4201e05d8 by goroutine 121:
  github.com/letsencrypt/boulder/ra.(*RegistrationAuthorityImpl).NewCertificate.func2()
      /go/src/github.com/letsencrypt/boulder/ra/ra.go:856 +0x9e

Goroutine 133 (running) created at:
  github.com/letsencrypt/boulder/vendor/google.golang.org/grpc.(*Server).serveStreams.func1()
      /go/src/github.com/letsencrypt/boulder/vendor/google.golang.org/grpc/server.go:508 +0xb8
  github.com/letsencrypt/boulder/vendor/google.golang.org/grpc/transport.(*http2Server).operateHeaders()
      /go/src/github.com/letsencrypt/boulder/vendor/google.golang.org/grpc/transport/http2_server.go:306 +0x1362
  github.com/letsencrypt/boulder/vendor/google.golang.org/grpc/transport.(*http2Server).HandleStreams()
      /go/src/github.com/letsencrypt/boulder/vendor/google.golang.org/grpc/transport/http2_server.go:370 +0xacd
  github.com/letsencrypt/boulder/vendor/google.golang.org/grpc.(*Server).serveStreams()
      /go/src/github.com/letsencrypt/boulder/vendor/google.golang.org/grpc/server.go:515 +0x1e7
  github.com/letsencrypt/boulder/vendor/google.golang.org/grpc.(*Server).serveHTTP2Transport()
      /go/src/github.com/letsencrypt/boulder/vendor/google.golang.org/grpc/server.go:496 +0x628
  github.com/letsencrypt/boulder/vendor/google.golang.org/grpc.(*Server).handleRawConn()
      /go/src/github.com/letsencrypt/boulder/vendor/google.golang.org/grpc/server.go:466 +0x5f3

Goroutine 121 (running) created at:
  github.com/letsencrypt/boulder/ra.(*RegistrationAuthorityImpl).NewCertificate()
      /go/src/github.com/letsencrypt/boulder/ra/ra.go:857 +0x1a4c
  github.com/letsencrypt/boulder/grpc.(*RegistrationAuthorityServerWrapper).NewCertificate()
      /go/src/github.com/letsencrypt/boulder/grpc/ra-wrappers.go:243 +0x263
  github.com/letsencrypt/boulder/ra/proto._RegistrationAuthority_NewCertificate_Handler.func1()
      /go/src/github.com/letsencrypt/boulder/ra/proto/ra.pb.go:445 +0xa1
  github.com/letsencrypt/boulder/vendor/github.com/grpc-ecosystem/go-grpc-prometheus.UnaryServerInterceptor()
      /go/src/github.com/letsencrypt/boulder/vendor/github.com/grpc-ecosystem/go-grpc-prometheus/server.go:29 +0xc9
  github.com/letsencrypt/boulder/grpc.(*serverInterceptor).intercept()
      /go/src/github.com/letsencrypt/boulder/grpc/interceptors.go:23 +0x9a
  github.com/letsencrypt/boulder/grpc.(*serverInterceptor).(github.com/letsencrypt/boulder/grpc.intercept)-fm()
      /go/src/github.com/letsencrypt/boulder/grpc/server.go:51 +0x99
  github.com/letsencrypt/boulder/ra/proto._RegistrationAuthority_NewCertificate_Handler()
      /go/src/github.com/letsencrypt/boulder/ra/proto/ra.pb.go:447 +0x1fb
  github.com/letsencrypt/boulder/vendor/google.golang.org/grpc.(*Server).processUnaryRPC()
      /go/src/github.com/letsencrypt/boulder/vendor/google.golang.org/grpc/server.go:748 +0xf67
  github.com/letsencrypt/boulder/vendor/google.golang.org/grpc.(*Server).handleStream()
      /go/src/github.com/letsencrypt/boulder/vendor/google.golang.org/grpc/server.go:942 +0x122a
  github.com/letsencrypt/boulder/vendor/google.golang.org/grpc.(*Server).serveStreams.func1.1()
      /go/src/github.com/letsencrypt/boulder/vendor/google.golang.org/grpc/server.go:507 +0xb6

@rolandshoemaker
Copy link
Contributor

This is an interesting race!

The issue is that we are using named return variables in ra.NewCertificate, cert is initialized when the function is called, we then read it in the function which inherits the parent scope so it is reading from the same point in memory, then when we do a return emptyCert, err we are overwriting the return variable, cert, with the contents of emptyCert.

Super simple example that triggers the same race (and shows what the problem is... kind of):

package main

import (
	"fmt"
	"time"
)

func ex() (h string) {
	h = "hello"
	go func() {
		time.Sleep(time.Second)
		fmt.Println("h:", h)
	}()
	return ""
}

func main() {
	ex()
	time.Sleep(time.Second * 2)
	fmt.Println("done")
}

I suspect this has never been a problem for us (although I'd want to check the logs to be sure) because ra.publisher.SubmitToCT is a gRPC method so most likely a copy of the slice is made before it gets a chance to be zeroed out by the return unless there was some significant lag between the two things happening (thinking about it this might actually be an issue when i.e. the gRPC connection is down and it is waiting to send the actual RPC... not really sure how the gRPC internals work though, it may have already marshaled the message in those situations).

Copy link
Contributor

@cpu cpu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

@cpu cpu merged commit 4ec662e into master Aug 29, 2017
@cpu cpu deleted the maybe-fix-race branch August 29, 2017 14:05
@briansmith
Copy link
Contributor

First, thanks for fixing this bug, @jsha, and thanks @rolandshoemaker for the explanation. I believe I introduced this and I had no idea this was something to watch for!

As far as the fix is concerned: Just as I overlooked it the first time, I think 1 year from now I'd likely revert this change as part of cleaning up code. So I think something else or additional should be done to prevent this from happening. e.g. Stop using named return values in this function and/or add a comment saying that the goroutine is given a complete copy of the cert DER to avoid a data race or something.

@cpu
Copy link
Contributor

cpu commented Aug 29, 2017

So I think something else or additional should be done to prevent this from happening. e.g. Stop using named return values in this function and/or add a comment saying that the goroutine is given a complete copy of the cert DER to avoid a data race or something.

@briansmith Thanks for bringing that up! I agree that there should at least be a comment here. I wish I thought of that during review :-) I think we've put a hold on any new code using named return values so its probably worthwhile to clean up the existing occurrences.

@jsha
Copy link
Contributor Author

jsha commented Aug 29, 2017

Yep, I agree we should clean this up more thoroughly by avoiding named return values: #3019. Excellent sleuthing work @rolandshoemaker, thanks for writing it up!

jsha added a commit that referenced this pull request Aug 30, 2017
This is a followup from #3017, in
which we identified a data race caused by the use of named returns. This also
reverts the change from that PR, which was only a surface level fix.

Fixes #3019.
rolandshoemaker pushed a commit that referenced this pull request Aug 30, 2017
This is a followup from #3017, in
which we identified a data race caused by the use of named returns. This also
reverts the change from that PR, which was only a surface level fix.

Fixes #3019.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants