-
Notifications
You must be signed in to change notification settings - Fork 583
Add log export benchmark to measure the cost paid for async abstraction #2027
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add log export benchmark to measure the cost paid for async abstraction #2027
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2027 +/- ##
=======================================
- Coverage 76.7% 76.7% -0.1%
=======================================
Files 122 122
Lines 20828 20828
=======================================
- Hits 15993 15992 -1
- Misses 4835 4836 +1 ☔ View full report in Codecov by Sentry. |
impl LogProcessor for ExportingProcessorWithFuture { | ||
fn emit(&self, data: &mut LogData) { | ||
let mut exporter = self.exporter.lock().expect("lock error"); | ||
futures_executor::block_on(exporter.export(vec![data.clone()])); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we block on runtime our LogProcrssor? It's generally not really performat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could try https://docs.rs/tokio/latest/tokio/runtime/struct.Runtime.html#method.block_on and see if there is difference. Maybe spin up the runtime as part of the Processor so we can reuse them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. This is not testing the async runtime. Better to create the async runtime outside of the timing loop, and then block using tokio runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified to use Tokio runtime's block_on
, and perf drastically dropped. (280ns to 350 ns)
#[derive(Debug)]
struct ExportingProcessorWithFuture {
exporter: Mutex<NoOpExporterWithFuture>,
rt: Runtime,
}
impl ExportingProcessorWithFuture {
fn new(exporter: NoOpExporterWithFuture) -> Self {
let rt = Runtime::new().unwrap();
ExportingProcessorWithFuture {
exporter: Mutex::new(exporter),
rt,
}
}
}
impl LogProcessor for ExportingProcessorWithFuture {
fn emit(&self, data: &mut LogData) {
let mut exporter = self.exporter.lock().expect("lock error");
// futures_executor::block_on(exporter.export(vec![data.clone()]));
self.rt.block_on(exporter.export(vec![data.clone()]));
}
fn force_flush(&self) -> LogResult<()> {
Ok(())
}
fn shutdown(&self) -> LogResult<()> {
Ok(())
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is measuring the case when the user doesn't have an async runtime?
Its immaterial if user has async runtime or not. The exporter (eg: exporters to systems like Windows ETW, Linux user_events) does not need async runtime, as it is executing synchronously. |
OK so it's testing the case where the exporter doesn't need async runtime. Does it covers the case where the exporter requires async runtime. e.g HTTP or gRPC exporter based on async HTTP and gRPC crate? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
likely related to creating/updating the async state machines etc.
State machine should create during compiling time so we shouldn't pay any cost during runtime.
Running state machine does have runtime overhead, but given the async function doesn't do anything the state machine probably only have one state. I think the overhead is mostly coming from the runtime setup here rather than the state machine itself.
It doesn't. |
Not sure I understand. There is no runtime here. So the overhead has to be coming from the usage of async. (The benchmarks are both doing exact same thing, only differing in the use of async for export) |
It doesn't change tha fact async tasks has overhead because of the need of exector. I probably went into the too details sutff here |
@TommyCpp @lalitb The discussion about runtime etc. is not very relevant (apologies my knowledge of async runtime is very limited). I have some workarounds in mind but first step was to validate the assumption that introducing async where it is not required is causing unnecessary perf hit. |
Yep. Sorry I kind of hijacked the discussion 😞
Would it make sense to offer two versions(one for sync, one for async). It's not uncommon in Rust eco system. One example would be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Benchmark looks good. I have a few thoughts on how to setup runtime/try differnt runtime to minimize the cost. I can explore it offline once we merge it
Opened an issue to track the change : #2031 |
The current export method is marked async. Even when scenarios where export is done synchronously, the async is causing some unnecessary overheads (likely related to creating/updating the async state machines etc...)
This PR just shows the overhead is ~10% (250 ns ->275 ns). Not sure if this is mimicking the actual exporter correctly, would love to get some feedback on this.