-
Notifications
You must be signed in to change notification settings - Fork 379
fix(joiner): resolve race condition #5281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
ee3009b to
fc9e950
Compare
| var eg errgroup.Group | ||
| j.readAtOffset(buffer, j.rootData, 0, j.span, off, 0, readLen, &bytesRead, j.rootParity, &eg) | ||
| eg, ectx := errgroup.WithContext(j.ctx) | ||
| j.readAtOffset(buffer, j.rootData, 0, j.span, off, 0, readLen, &bytesRead, j.rootParity, eg, ectx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ectx==ctx and so therefore it is not needed to add it to the signature, since readAtOffset can just select on the struct field
| return ectx.Err() | ||
| default: | ||
| } | ||
| ch, err := g.Get(ectx, addr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the Get method already gets passed the context, shouldn't/wouldn't it better that it just checks whether the context expired before+after doing anything? this way you get better coverage of the whole codebase instead of having to dot every usage of the method elsewhere.
| recovery := func() storage.Getter { | ||
| g.config.Logger.Debug("lazy-creating recovery decoder after fetch failed", "key", key) | ||
|
|
||
| if d, ok := g.getFromCache(key); ok && d != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since getter.New should return almost immediately, wouldn't it be better to just lock once and do both the get and the set, instead of locking twice, checking twice and potentially calling (and allocating) in vain to getter.New?
Bear in mind also that getter.New has side-effects as it starts a whole set of goroutines with go d.prefetch(). It would be good to avoid it if it's not necessary.
| } | ||
|
|
||
| key := fingerprint(addrs) | ||
| g.mu.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just my 2-cents: since one mutex lock-unlock cycle turns into multiple, the chances of having race conditions increases. Since multiple callers may have now access to the same critical section at the same time, this might cause other problems elsewhere down the line (once some of the interleaving logic changes).
Checklist
Description
Fixes a race condition in pkg/file/joiner that caused CI test timeouts. Refactored decoderCache.GetOrCreate to prevent deadlocks by correctly managing mutex locks around getter.New calls.
Makefile: Filter linker warnings and disable test caching for test-ci-race
Open API Spec Version Changes (if applicable)
Motivation and Context (Optional)
Related Issue (Optional)
Screenshots (if appropriate):