-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[component] Move component status features to its own module #10725
[component] Move component status features to its own module #10725
Conversation
2ea4302
to
6f486c7
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #10725 +/- ##
==========================================
- Coverage 92.18% 92.12% -0.06%
==========================================
Files 403 400 -3
Lines 18792 18816 +24
==========================================
+ Hits 17323 17335 +12
- Misses 1109 1121 +12
Partials 360 360 ☔ View full report in Codecov by Sentry. |
6f486c7
to
900cebe
Compare
if c.hostWrapper == nil { | ||
c.hostWrapper = &hostWrapper{ | ||
host: host, | ||
sources: make([]componentstatus.StatusReporter, 0), | ||
previousEvents: make([]*componentstatus.StatusEvent, 0), | ||
} | ||
} | ||
|
||
statusReporter, isStatusReporter := host.(componentstatus.StatusReporter) | ||
if isStatusReporter { | ||
c.hostWrapper.addSource(statusReporter) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new technique for sharedcomponent is to build the chain or reporting sources as Start is getting called. Previously reported events are remembered so that if Start is called multiple times, each subsequent StatusReporter
and get the previously emitted StatusEvents
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach seems reasonable to me, I think the ergonomics are acceptable and it allows us to separate the status reporting mechanism from the rest of the component
module
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The approach looks good to me overall.
One other note - |
@djaglowski I've moved @mx-psi the fact that this move is possible makes me think maybe #10222 is back on the table, but I need to think about it more. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sharedcomponent changes worry me a little bit. I'd like to see if I can test drive it with the health check extension to make sure it is still working as intended. Otherwise, this generally looks good to me.
@mwear the key change happens when a |
What about the recoverable error case? When reporting status from within the component (I guess this is via host now), it should report status for all of the components it represents. All components represented by the shared component must be in the Starting state before the API can be from called from "within" the shared component. If you report RecoverableError during start, but some instances are not Starting, you will get an invalid state transition for the unstarted instances, and those instances will be stuck reporting an invalid status. I'm not sure if this makes sense, but this is my concern with this change. It's a little hard to test this end to end with the health check extension to verify because there are so many changes. In any case, if these words make any sense, and are accurate, we may have to handle this another way. |
It's possible that the way you are handling previous events solves my concerns. I'd still like to find a way to verify this end to end. |
#### Description Reorganizes service to not require `servicetelemetry.TelemetrySettings` and instead depend directly on `component.TelemetrySettings` Whether or not we move forward with #10725 I think this is a useful change for service. #### Testing Unit tests
@mwear I'll see if I can add a test case in this PR with that example. |
@mwear I've added a very rough e2e test in This gives me confidence that the statuses do end up in the extensions correctly. How the extension handles the statuses is up to the extension right? One thing of note is that the existing (in main) graph/sharedinstance implementation runs into an invalid state in the graph's fsm because the graph reports I highlight this as a datapoint that shows that status reporting still needs lots of maturing, and while this PR changes a lot about how status reporting works, I think that is ok since it is still so young. I feel confident that if we do find issues with this implementation later on that we'll be able to make fixes via graph/sharedinstance/componentstatus without needing to break user APIs or component behavior. |
Thanks for the test @TylerHelmuth, this gives me more confidence. I'm aware of the repeat status reporting that was part of the original implementation. The finite state machine is meant to protect and enforce the lifecycle of a component, and invalid state transitions are not necessarily errors, in many cases should be considered a no-op. I know there is some logging around this that I would be in favor or removing because it gives the impression something unexpected happened, when that's not exactly the case. This is discussed in the status reporting documentation under the runtime section.
|
My biggest reservation is that we need the collector to be observable for it to be 1.0. Part of this means having a functioning health check extension. The current replacement requires component status, so I don't think we can call the collector 1.0 without having component status enabled by default. If we are moving this code for organization purposes, that's fine, but I don't think we can skip this work for 1.0 altogether. |
The healthcheck extension is not in our GA roadmap. I think it's important that we keep working on it, but our current plans don't include this for 1.0 as defined in the GA roadmap |
…ntstatus module (#10730) #### Description Duplicates component status reporting features from `component` into a separate module, `componentstatus`. In a future PR, when `component.TelemetrySettings.ReportStatus` is removed, I'll update Core to depend on `componentstatus`. This work isolates component status public API from `component` and `extensions`, which will allow us to move forward with their 1.0 work while component status reporting matures. <!-- Issue number if applicable --> #### Link to tracking issue Related to #10725 --------- Co-authored-by: Matthew Wear <matthew.wear@gmail.com>
Ive created #10777 to complete the transition in core. |
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
) #### Description This PR removes `ReportStatus` from `component.TelemetrySettings` and instead expects components to check if their `component.Host` implements a new `componentstatus.Reporter` interface. <!-- Issue number if applicable --> #### Link to tracking issue Related to #10725 Related to #10413 <!--Describe what testing was performed and which tests were added.--> #### Testing unit tests and a sharedinstance e2e test. The contrib tests will fail because this is a breaking change. If we merge this I and @mwear can commit to updating contrib before the next release. --------- Co-authored-by: Pablo Baeyens <pablo.baeyens@datadoghq.com>
Closed as inactive. Feel free to reopen if this PR is still being worked on. |
Description
This PR is an attempt are removing component status reporting features from the component module. The technique was to make a new
componentstatus.StatusReporter
interface thatcomponent.Host
could optionally implement. This has impacts:servicetelemetry.TelemetrySettings
struct.ReportStatus
function until thecomponent.Start
method provides the host.Start
, say for the traces pipeline, sharedcomponent will only report the error returned fromStart
for the traces instance, and then continue on to shutdown.If this solution's general idea is tenable, then I believe it could be broken down into slightly smaller PRs, such as:
servicetelemetry.TelemetrySettings
: [service] Remove servicetelemetry.TelemetrySettings #10728StatusReporter
interface: [component] Remove ReportStatus from component.TelemetrySettings #10777Link to tracking issue
Related to #10413