-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split pdata into multiple packages #4832
Comments
My first question/reaction: Considering all these problems... Do we still want to split the packages? |
Also the only downside of the approach 3 can be mitigated to some extent. Signal specific parts of the |
@open-telemetry/collector-approvers please let me know if the options outlined in this issue need more clarifications for further discussion. I don't think we should consider Option 1 due to the overhead. So I believe the question is whether we want to accept one of the following: Option 2: exposing functions that are needed for internal usage only like Option 3: having aliases in |
I don't feel that this issue provides enough information for us to determine the best path forward. I don't see from here what the structure of independent signal modules would look like. What would be the dependencies among them and the existing modules/packages? What would the wrapping structures for option 1) look like? What would the interfaces for common types in option 2) look like? Is it only exporting initializers or is there more that needs to be exposed? Why would it be a bad thing to allow the initialization of common data types that might be used by multiple signals? It feels like they are currently internal-only but need not be going forward.
What overhead? Is it a performance concern? Do you have benchmarks that demonstrate the magnitude of the overhead?
Why should the Again, I think we need a lot more detail on all of these options before we can make an informed decision. Please see open-telemetry/opentelemetry-go#2526 where the Go SIG was recently presented with choices for how to structure a metrics API. Having that level of detail made it easier for us to understand the end-state we're moving toward and how we're getting there. |
Thanks for putting together a draft PR demonstrating option 3. I don't know how feasible it would be to have a PR for both other options, or if it would be easier to put together a rough sketch of what Options 1 & 2 would look like (as per @Aneurysm9's linked proposal in the go repo) Looking through the PR for Option 3, I'm not sure how different from option 2 it is, sure we're hiding the internals but exposing the aliases essentially does almost the same as exposing the internals here.
I don't think there's a good way to enfore only pdata signals to use exported functions, so i guess the question is what's the downside in exposing those methods? |
Not really, for Option 2 we would need to expose functions like
I think the only downside is their actual presence. Users won't be able to do anything with them outside the Collector's |
Can you elaborate on this? I'm not sure I understand. |
Option 2 implies introducing publicly exported functions like the following: func ResourceFromProto(orig *otlpresource.Resource) Resource {
return Resource{orig: orig}
} This function can be imported by any client, but cannot be used anywhere outside of Core Collector's |
Would that be an issue if we used the same proto structs that everyone else uses? I think that being able to interoperate with other systems at that level would be advantageous. |
That would be a significant performance degradation (IIRC around 1.3x last time we measured). Also probably (?) makes impossible/more complicated the patching/manual Protobuf serialization code that @codeboten works on. |
+1. I think we need to keep asking this question. I still don't see an alternate that works if we keep insisting on our formal approach to introducing experimental API (according to Go semver requirements). However if we relax this requirement then the alternate is much simpler: don't split, mark unstable parts (currently logs) as "unstable" via some sort of labelling (like gRPC does). I have a feeling by trying to be formal we are complicating our lives disproportionately. |
@tigrannajaryan How would the implementation in #4833 'not work' in this sense? |
What about the other way around, then? Take the internal structs from the collector and make them available for everyone else to use. If they're that much better than the structs in the proto-go repo then why shouldn't we be encouraging everyone to use them? It seems that the perceived complication here is simply stemming from the fact that we're not using a common OTLP implementation and we should look at fixing that rather than throwing up our hands and saying "well, we've tried nothing and we're all out of ideas!" As for the patches on the existing implementation that @codeboten has developed, my understanding is that they are only necessary because we're not using the structs from proto-go that are generated with a protoc that understands optional fields. Not needing them seems a lot simpler, not more complicated.
There will be no alternative that works and does not require changes, potentially significant ones. I do not believe it is a sound idea to declare
I know that by trying to avoid correctness we will complicate the lives of our users. There are many more of them than there are of us. The cost of choosing to be incorrect here will be high. |
By "alternate" I mean a solution that does not require splitting pdata into multiple packages. My understanding is that implementation in #4833 still requires pdata to be split into modules by signals. |
The reason we keep them internal is to allow future optimizations. We don't want in-memory representation to be part of a public API. There are 2 potential optimizations that I would like to try some time in the future:
These don't require pdata API changes but require different in-memory structs. |
Somehow this works for gRPC, which tells me that maybe is not such a big deal. See https://github.com/grpc/grpc/blob/master/doc/versioning.md
Yes, it is a tradeoff. To clarify, I don't insist that we must do this. I only ask every approver/maintainer to critically assess the tradeoffs. I am personally leaning towards grpc approach, but only slightly so, and can be convinced otherwise. |
@mx-psi thank you for the links, this is very useful and shows that it indeed causes pains to users. I think I am convinced :-) The other alternate is to follow proper semver and increase major version number anytime we change part of pdata that depends on a non-stable OTLP signal. The downsides seem to be:
|
During today SIG meeting, @Aneurysm9 pointed to an interesting solution. It seems that you can depend on an internal package from a different module as long as the path is still respected. |
I'm on PTO til Friday, so wasn't able to participate. @Aneurysm9 can you please share more details about the solution you proposed? |
The Go internal package import restrictions pre-date modules and operate solely on the package import path hierarchy. So long as "the package doing the import is within the tree rooted at the parent of the It would also allow for a func NewResourceFromProto(orig *otlpresource.Resource) Resource { without exposing them to consumers outside of the tree rooted at |
@Aneurysm9 you are correct in your suggestion to make that API part of the "internal" package. I am not sure how pacakge type Resource struct {
orig *otlpresource.Resource
} pacakge func NewResourceFromProto(orig *otlpresource.Resource) Resource {
// does not have access to the private member in `return Resource{orig: orig}`
} Solution to this move |
@Aneurysm9 @bogdandrutu @codeboten @tigrannajaryan I added couple more PRs to illustrate the proposed approaches:
I don't think it's reasonable to illustrate Approach 1 since it's pretty similar to Approach 3, it uses wrapping instead of aliasing witch adds more unnecessary complexity without any benefits. |
Here is way I propose version 3:
|
Most of the code operates on |
Thanks for putting together the draft for approach 2 @dmitryax. After reviewing it I'm leaning towards approach 3 to minimize new APIs to add at this point. |
I am also leaning towards approach 3, I think @bogdandrutu's points make a lot of sense |
I think the plan laid out in #4918 (comment) makes the congruence between these approaches clearer, though I do wish that #4919 had used the same package structure. The original proposal here for option 3 was missing the detail needed to draw a line through to the final destination. I'm fine with moving in this direction now, though as I question in #4918 (comment) I'm not sure we need #4833 as an interim step before #4918.
Isn't this a problem any time you have internal packages with types that have exported methods? I don't think options 2 or 3 are any different in this regard, are they? |
Problem
As a result of discussion on how to handle experimental data types, a decision was made to split pdata package into multiple modules per signal and publish them as separate modules.
pdata
package incapsulates internal proto data structures into private fields. Given that currently all signal data structures (logs, metrics, traces) use common structures likeResource
,AttributeMap
,AttributeValue
, they need direct access to the private proto fields of the common structures. Splitting them as is cannot be done since it removes the access to the private fields of common data types from other signal packages. There are several ways how this can be solved, all of them have some drawbacks.Possible solutions
1. Wrap common pdata structs
Common pdata structures like
Resource
and others can be wrapped into another struct that is accessible in a public module while wrapped part can be placed into an internal package that all other signals have access to.Cons:
2. Add new exported functions to common data structures to be used only by other pdata signals
Private functions of common data types that currently used to pass private proto fields can be exported. In that case other signal modules can have access to it.
For example:
can be changed to
Pros:
Cons:
3. Put
pdata
into an internal package as is and use aliases to it in exported modulesThe whole pdata package can be put as is into an internal package. And exported packages will contain only aliases to the internal pdata package.
Pros:
Cons:
Proposal
The third solution seems to be the most appropriate. It keeps the overhead minimal while introducing no changes to existing pdata structures. It just adds another layer of reference in the code and generated documentation.
The text was updated successfully, but these errors were encountered: