-
Notifications
You must be signed in to change notification settings - Fork 417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/span limits #816
Feature/span limits #816
Conversation
…y-cpp into feature/span_limits
…y-cpp into feature/span_limits
…9/opentelemetry-cpp into feature/span_limits
Codecov Report
@@ Coverage Diff @@
## main #816 +/- ##
==========================================
+ Coverage 95.52% 96.16% +0.65%
==========================================
Files 156 153 -3
Lines 6618 6458 -160
==========================================
- Hits 6321 6210 -111
+ Misses 297 248 -49
|
@@ -114,11 +114,21 @@ class AttributeMap | |||
}); | |||
} | |||
|
|||
const std::size_t GetAttributeMapSize() const noexcept |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need this. Please check my PR here, as it'd be nicer if we simply expose all properties of real std::map
-alike container here:
class AttributeMap : public std::unordered_map<std::string, OwnedAttributeValue> |
You can simply use size()
instead of adding non-standard getter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will pick your changes from attribute_utils.h and modify accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. That works. I'll reset my status to unblock your review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be much more elegant if that's just a usual std::map
container. I am doing it in the other PR I mentioned in the comment. #771
…y-cpp into feature/span_limits
} | ||
|
||
private: | ||
SpanLimits span_limit_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should add SpanLimits
as a data member here. It will get allocated every time a new recordable is created ( i.e, every time a new span is created ). And SpanLimits values are not going to change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not intended change, maybe didn't sync correctly.
if (attribute_map_.GetAttributeMapSize() < GetSpanLimits().AttributeCountLimit | ||
|| attribute_map_.KeyExists(key)) { | ||
attribute_map_.SetAttribute(key, value); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is a good idea to drop the attributes silently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
We can either log every time when attribute is dropped, but that would be excessive logging and is not recommended in the spec.
Second way, we can store the information of how many attributes are added in the span and so the number of attributes dropped. This should be enough information, any opinion on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this won't really work... Because I'd see that class being used not necessarily only for Span Attributes. But for Resource Attributes as well. Do resource attributes have the same limit? I mean - can we move this check elsewhere.. Can we keep it a generic converter-helper class to transform attributes from API AttributeValue to OwnedAttributeValue, and just keep it as generic as possible, without binding it to Span specifics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The limit is specific to Span attributes, links, and events. SDK doesn't own/copy any of these properties internally, so it is not possible to impose this limit within sdk. All these properties are propagated to exporters, and it is up to the exporters to impose the limits. The exporters may choose not to transform these properties to OwnedAttributeValue
and instead transform/serialize to (say) JSON format for performance reasons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lalitb - I think we should have that trigger (limit enforcement) somewhere when we transform into OwnedAttributeValue
. I don't have concrete proposals. Maybe we should merge the foundation class for some enforcement, proposed here in this PR. Then find out a good way to bind to that class.
|
||
} // namespace trace | ||
} // namespace sdk | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be a class with static methods to return the limits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are having a global tracer provider and span limits are associated with that only, so that should be better as it will simplify arguments passing. Will make the changes in next commit.
@anshulkumar19 - While I am thinking about the approach to follow for supporting the span limits, the problem I see is that our implementation doesn't copies/owns any of the span properties inside api/sdk, and these properties (links, events, attributes) are propagated directly to exporters. |
return lock; | ||
} | ||
|
||
int AttributeCountLimit = 128; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for naïve question :
-
are we anticipating Span limits for different exporters (Jaeger, Zipkin, OTLP, or some other custom exporter, say FluentD-MsgPack) to be the same, and following these exact defaults?
-
If the answer to Q1 is
No
, then can we have these limits make somehow tunable, and be part of some form of Exporter configuration? This is becoming a bit tricky though. In case of multi-processor, in case if we want to route our telemetry to different exporter, and if every exporter has its own limits..
Apologies if it's a dumb question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spec mentions these limits set to tracer provider and we have one global tracer provider, so in the current scenario all of these limits should be same.
Moreover, according to my understanding since these limits are there for the safety from erroneous code adding absurd number of attributes to a span so it should be related to span related limits rather than depending upon any exporters. So, once these limits are defined they should be followed for each kind of span (sdk/exporters).
Reading the spec - these are all soft limits - And I fork my data via multi-processor to two different exporters at once. For example:
It seems like then the issue is to configure the soft limits and intercept these externally , beyond the multi-processor , somewhere when Span / Span Events reach concrete exporter. That exporter then should know what limit to impose. In that case if the limit is exceeded, there has to be a mechanism to notify the caller about the failure of a concrete exporter.. Exporter 1 fails (limit exceeded), but exporters 2 and 3 let span+events through. Because their allowed limits MAY BE higher than those of Exporter 1. I don't see limit in fluentd / FluentD forward protocol / MsgPack spec, for example.. Then that failure notification for only one of the exporters - has to be bubbling-up the status code back to the caller itself. This notification would most likely be asynchronous. Because if we are "forking" events to different destinations in a batching thread in the background, that is the only moment we'd actually know which concrete exporter gonna drop and which one gonna let it go? We can't immediately return a failure to the caller.. Unless we iterate for each set of limits - for each exporter, and find beforehand what exporter lets it get thru and what not. Also the other interesting thing, as we merge resource attributes with span attributes, aren't we potentially exceeding the combined limit... Say, I set resource attributes to include some 100 fields, and my physical channel / protocol allows 1000 fields (columns) total, and I add 901 field in my telemetry data - how does the combined sum limit get calc'd.. Appears like in most cases it's something that only lower-level would know, not the API. I'd like to avoid imposing those soft limit in the API. And find a good way to solve this limiting elsewhere. I think your limiter class, in general, looking Okay. We can merge that. But I'm not sure how to plug it in best. Taking into account that:
I'd like to avoid a seemingly easy solution to limit our future ability to operate in more complex, real-world configurations. I think we should also consider some async callback / error notification. Because the limiting spec says we should to report back the failure(s) in case if soft limit is exceeded. Don't see that in the PR here yet. Because we just don't have that feature in SDK. Maybe we can start with ability to notify user of error(s)? Some form of internal async error callback. Otherwise limiting and not telling back the caller that we dropped something is very confusing.. Probably nearly as bad as accepting the data, exceeding the limit later on somewhere during HTTP POST or gRPC message post, and dropping it. Same result - data loss, users unhappy. |
Regarding my last point on async error callback - notification. I'd like to propose this being added as a generic facility into OpenTelemetry C++ SDK, following this pattern we have here: https://github.com/microsoft/cpp_client_telemetry/blob/master/lib/include/public/DebugEvents.hpp It's a pub-sub model, when caller may receive an "interrupt" error notification - DebugEvent- sent from SDK worker thread(s) in case if something went bad. Caller listens for those DebugEvent notifications via registering DebugEventListener . That way the caller gets their |
@maxgolov - Thanks for bringing the error handling scenario, as it has been discussed in past too. Request you to please add your recommendations here #408 . Specs also provide guidance on supporting custom error handlers here. |
@lalitb - guidance is rather soft, it's not a requirement but rather a recommendation. The link you are showing says : These are examples of how end users might register custom error handlers. Examples are for illustration purposes only. Language library authors are free to deviate from these provided that their design matches the requirements outlined above. I can send in a PR with a proposal for the DebugEvent emitter / DebugEventListener pattern implementation. It allows both - local async subscriptions for SDK errors. Say at Exporter (or maybe at TracerProvider level). Or a subscription to single global debug/error event handler for global lifecycle / state management errors that could occur outside of specific TracerProvider or Exporter. |
We can discuss the proposal within #408 before implementation. In general, we can start as per specs guidelines and provide the functionality to let applications define custom error handlers ( which can be async/sync as per their requirements), and otel sdk providing global methods to set and get these error handlers. This will also allow the applications to plugin the otel logging api as a custom error handler going ahead. I don't think we should be designing a new error handling framework for this purpose. |
I was about to add the similar constraints in exporter side "SetAttributes" functions so that these limits would be imposed there. Since the spec talks imposing these limits only to the sdk side, so that becomes tricky. |
With the current otel-cpp design, the only place to filter these span properties would be at the exporter level. And if that is the case, a better approach would be to pass these limit configuration during exporter startup along with other exporter options. Eg in Zipkin, it would be here: opentelemetry-cpp/exporters/zipkin/include/opentelemetry/exporters/zipkin/zipkin_exporter.h Lines 47 to 55 in 3de8f35
Each exporter would have different memory requirements, and thus different configuration values. This is something not defined by specs, and let exporters decide how they want to implement it. But for now, I don't think we should implement this optional requirement at the SDK level when there is nothing to filter there at SDK. @pyohannes, @maxgolov - any thoughts? |
Yes, I agree with that. |
Thanks for your work @anshulkumar19 - as @maxgolov suggested this may be utilized in the future if we plan to have an upper bound for |
Fixes #365
Changes
The changes made in this PR addresses the feature of imposing span limits parameters as defined in spec.
For significant contributions please make sure you have completed the following items:
CHANGELOG.md
updated for non-trivial changes