Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

Conversation

@vdavid
Copy link
Contributor

@vdavid vdavid commented Feb 13, 2023

My first implementation for outbound request logging worked so that it blocked the request for the duration of the logging.
This didn't seem to be a problem at first because logging into Redis is usually fast. However, that implementation didn't consider the case when Redis is slow.
This PR modifies the logging behavior to make the logging non-blocking.

Test plan

I've run the code, and it logs requests and reads the logs as intended. Also updated the tests.

@mrnugget
Copy link
Contributor

Hmmmmm, wouldn't it maybe be easier to do the logging inside the middleware? I don't think you definitely need to log everything that went wrong with the middleware next to the request; a separate error message actually seems useful. What do you think? Would probably simplify things a lot.

@vdavid
Copy link
Contributor Author

vdavid commented Feb 13, 2023

@mrnugget the thing is, to my best knowledge, the Redis logger middleware doesn't have access to a logger. This was a bit of a pain when implementing the middleware in the first place.

@mrnugget
Copy link
Contributor

Ah, I see. I think in this case it might be fine to instantiate a new logger, just for error logging. @keegancsmith and @bobheadxi or @Strum355 might have different ideas? I do think that passing channels around on a context "just" to log errors from a middleware that didn't ultimately make the request fail is not worth it.

*req = *req.WithContext(context.WithValue(req.Context(),
redisLoggingMiddlewareErrorKey, middlewareErrors))

go func() {
Copy link
Member

@bobheadxi bobheadxi Feb 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This creates a new goroutine for each request, and it seems that this can block for some time. Should there be a separate type that handles queueing events for logging, and/or even batching how we push them into redis? This would be a good logical home for relevant logging as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This creates a new goroutine for each request, and it seems that this can block for some time.

Ideally it should be instant. Only if we messed up something should it block infinitely. And we're also discussing in the other issue that @keegancsmith opened to use a context around the FIFOList abstraction so we can return on deadline.

I think the added complexity of might not be worth it, but not a blocker.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on having context deadlines in FIFOList - something to make sure this backlog doesn't grow indefinitely if writing to redis is blocked somehow, since Sourcegraph can make lots of requests and this middleware is IIRC applied to every external service HTTP client

}

// Wait for two seconds to make sure the async logging gets done
time.Sleep(2 * time.Second)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is good enough. It doesn't seem elegant, but:

  • With the current implementation, I could think of no way to wait for the Redis saver goroutine to finish.
  • I didn't want to introduce more complexity to the implementation code for the sake of the test.
  • I didn't want to remove the test.

I used 2 secs rather than 1 sec to lower the probability of this test becoming flaky.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other option is to use a loop and exit early if assertion holds:

after := time.After(5*time.Second)
for {
  // assertions
  if myAssertionsHoldTrue {
    break
  }

  select {
  case <-after:
     t.Fatal("ee timeout reached!")
  default:
     time.Sleep(500*time.Milisecond)
  }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried both, thank you! assert#Eventually was more elegant, so went with that.

@vdavid vdavid marked this pull request as ready for review February 16, 2023 11:38
@vdavid
Copy link
Contributor Author

vdavid commented Feb 16, 2023

@keegancsmith @mrnugget @bobheadxi I've incorporated the feedback I got from you all, and kept the PR size small. Please take another look.
Also, if you have any concerns about merging this before tomorrow's branch cut, please speak up. I consider this a fix, so I'd prefer to include it.

}

// Wait for two seconds to make sure the async logging gets done
time.Sleep(2 * time.Second)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@mrnugget mrnugget left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree on getting this in. But after it's merged, you should monitor when it goes out to dotcom (sg live dotcom) and then once it's live turn the feature on on dotcom and make sure it works, so we can get a fix in before the release, if needed.

@vdavid vdavid enabled auto-merge (squash) February 16, 2023 13:47
@vdavid vdavid disabled auto-merge February 16, 2023 13:47
@vdavid vdavid enabled auto-merge (squash) February 16, 2023 13:47
@vdavid vdavid merged commit 5ceab5b into main Feb 16, 2023
@vdavid vdavid deleted the dv/outbound-requests-middleware-goroutine branch February 16, 2023 13:56
@vdavid
Copy link
Contributor Author

vdavid commented Feb 17, 2023

@mrnugget I've deployed it to dotcom and tested it. It works but it's not nice, there are a lot of flickers and sometimes the React component just stops updating for no apparent reason, even though the requests keep on arriving from the backend every 5 seconds.
This is not a regression (it looks like it always sucked with so many requests 😬), but I think the UX is rather bad this way. One quick way I thought of to improve it significantly was to add a pause/resume button. PR: https://github.com/sourcegraph/sourcegraph/pull/47812
Maybe the underlying useQuery hook is buggy as well because sometimes it fetches data 5-6x at once (saw it in the Network tab of DevTools). I think, with the "pause" button, it'll function well, but this feature will appreciate some more front-end love later.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants