Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split otlp/otlpgrpc #5027

Closed
bogdandrutu opened this issue Mar 17, 2022 · 22 comments · Fixed by #5168
Closed

Split otlp/otlpgrpc #5027

bogdandrutu opened this issue Mar 17, 2022 · 22 comments · Fixed by #5168
Assignees
Labels
discussion-needed Community discussion needed priority:p1 High release:required-for-ga Must be resolved before GA release

Comments

@bogdandrutu
Copy link
Member

Decision was made to split the pdata package into multiple per signal packages, see #4832.

As a followup we need to split the otlp and otlpgrpc as well since they are part of the same current module model. Here are some options:

  1. Extract the current packages into separate module without no other changes.
  2. Move everything from otlp and otlpgrpc into the equivalent signal package decided in Split pdata into multiple packages #4832.

I personally believe that the 2nd option is cleaner and gives us the "encapsulation" we need.

@dmitryax dmitryax added the priority:p1 High label Mar 18, 2022
@dmitryax dmitryax self-assigned this Mar 18, 2022
@dmitryax dmitryax added the release:required-for-ga Must be resolved before GA release label Mar 18, 2022
@dmitryax dmitryax added the discussion-needed Community discussion needed label Apr 5, 2022
@dmitryax
Copy link
Member

dmitryax commented Apr 5, 2022

Looks like we need to make a decision here to proceed with #5087.

Currently we have otlp and otlpgrpc packages under /model. IMO having them under model is not much cleaner than merging them into pdata. If we go with the 2 approach, we can have the following structure:

/pdata
  /ptrace (including trace related API from `pdata/otlp` and `pdata/grpcotlp`
  /pmetric (including metrics related API from `pdata/otlp` and `pdata/grpcotlp`
  /plog (including logs related API from `pdata/otlp` and `pdata/grpcotlp`
/semconv

If we go with 1, we need to figure out another structure. I'm thinking of something like this:

/model
  /pdata
    /ptrace
    /pmetric
    /plog
  /semvconv
  /otlp
    /otlptrace
    /otlpmetric
    /otlplog
  /otlpgrpc
    /grpctrace
    /grpcmetric
    /grpclog

We can also remove /model for this structure and have pdata, otlp, otlpgrpc and semvconv at the root level, but this will require moving /model/internal to /internal

cc @Aneurysm9 @bogdandrutu @tigrannajaryan , any other ideas?

@tigrannajaryan
Copy link
Member

From what I understand otlpgrpc is currently used by otlp receiver/exporter only. Is this true? If so then why does it need to be a public package? Can we move it to internal/otlp or something like that?

@tigrannajaryan
Copy link
Member

If we move otlpgrpc to internal then the only remaining bit is otlp which is a tiny package that does not have any dependency that is not already a dependency of pdata so I don't mind if otlp just gets merged with pdata.

It will change the meaning of pdata from "internal in-memory representation" to "internal in-memory representation and serialization functions", which I think is fine. It will not bloat the code size of pdata in any meaningful way, won't add any other 3rd party dependencies, so I think it does not make pdata less useful as an importable library in any significant way.

(I get that maybe we loose in purity of separation of concerns, but again, introducing 3 new packages for otlp, each package being a tiny, few lines of code seems excessive and I think adds maintenance burden).

@mx-psi
Copy link
Member

mx-psi commented Apr 5, 2022

From what I understand otlpgrpc is currently used by otlp receiver/exporter only. Is this true? If so then why does it need to be a public package? Can we move it to internal/otlp or something like that?

otlpgrpc is used by several external dependencies (including us, Datadog :) ). I think it's generally useful for writing a gRPC OTLP server, and if this were made private we would want to ensure that there is a workable alternative to it.

@tigrannajaryan
Copy link
Member

otlpgrpc is used by several external dependencies (including us, Datadog :) ). I think it's generally useful for writing a gRPC OTLP server, and if this were made private we would want to ensure that there is a workable alternative to it.

OK. Looks like it is necessary if one wants to implement OTLP support in their codebase. I retract my proposal to move otlpgrpc to internal.

@bogdandrutu
Copy link
Member Author

So in the current otlp package it is not the protocol (p part of otlp), they are just serialization to from proto binary/json of the [Traces|Metrics|Logs]Data messages (see https://github.com/open-telemetry/opentelemetry-proto/blob/main/opentelemetry/proto/trace/v1/trace.proto#L37). I think this fit well into the signal package, since also the interfaces Marshaler and Unmarshaler are defined.

@bogdandrutu
Copy link
Member Author

Also this applies for all packages under model which are used by third-party more than anything else. If we have in one package multiple signals then we cannot deal with the experimental signal case that triggered the split of the pdata (if this is not a problem anymore then we should not split pdata), because of that we MUST split these packages as well, and because of that I think it is natural to have them in the same signal package.

@mx-psi
Copy link
Member

mx-psi commented Apr 5, 2022

I don't feel very strongly either way, but I find Bogdan's arguments here more convincing, +1 for moving otlpgrpc symbols together with the associated signal in p<signal>.

@tigrannajaryan
Copy link
Member

I don't like the merging because:

  1. It turns a package with one well-defined purpose (which is "in-memory representation") into something that is not so well-defined. I can make some concessions here, but see point 2.
  2. If a 3rd party codebase for example wants to just implement a processor it needs to import pdata (and some other stuff from consumer or component, but let's ignore that for now). Why does a processor need to bring anything grpc-related as a dependency? otlpgrpc has "google.golang.org/grpc" as its dependency, which in turn has a bunch of other transitive dependencies. The current pdata as far as I can see has only one dependency "github.com/gogo/protobuf" and that obviously cannot be eliminated, so it is as minimal as it can be.

So, we dilute the meaning of the pdata package and blow up its size/complexity as a dependency.

I am not strongly opposed to merging and can accept it if there is no better approach.

@bogdandrutu
Copy link
Member Author

bogdandrutu commented Apr 5, 2022

The current pdata as far as I can see has only one dependency "github.com/gogo/protobuf" and that obviously cannot be eliminated, so it is as minimal as it can be.

Not true unless we do way more splitting. Because we need access to the underlying protos, they need to be in the internal together. So you will always depend on the grpc anyway.

It turns a package with one well-defined purpose (which is "in-memory representation") into something that is not so well-defined. I can make some concessions here, but see point 2.

In otlpgrpc we have the Request/Response messages which are also in-memory representation of the request/response messages for OTLP protocols (grpc/http). Indeed we do have the grpc stubs as well.

So what I hear from @tigrannajaryan is that the disagreement is about the otlpgrpc not about the current otlp which does not have anyway any extra dependencies.

@tigrannajaryan
Copy link
Member

Not true unless we do way more splitting. Because we need access to the underlying protos, they need to be in the internal together. So you will always depend on the grpc anyway.

@bogdandrutu Can you point me where in the code the current pdata implementation depends on grpc (directly or transitively)? I wasn't able to find it.

@tigrannajaryan
Copy link
Member

So what I hear from @tigrannajaryan is that the disagreement is about the otlpgrpc not about the current otlp which does not have anyway any extra dependencies.

Yes, that's correct. I am fine with merging otlp into pdata since it is tiny and has no extra dependencies.

@tigrannajaryan
Copy link
Member

I thought a bit more about this and changed my mind. Every possible 3rd party that would want to import pdata which I can imagine is also already dependent on "google.golang.org/grpc". My desire to avoid making "google.golang.org/grpc" a dependency of pdata achieves nothing in terms of the final result from the 3rd part perspective.

So, I am fine with merging otlpgrpc into pdata as well.

@dmitryax
Copy link
Member

dmitryax commented Apr 5, 2022

I also think that otlp package fits pdata pretty well, but not sure about otlpgrpc.

Another option might be:

/pdata
  /ptrace
    /ptracegrpc
  /pmetric
    /pmetricgrpc
  /plog
    /ploggrpc

This would solve the google.golang.org/grpc dependency problem that Tigran mentioned with the lazy module loading.

@tigrannajaryan
Copy link
Member

/pdata
  /ptrace
    /ptracegrpc

Will ptrace and ptracegrpc be different go modules or one module?

@dmitryax
Copy link
Member

dmitryax commented Apr 5, 2022

ptrace will be one module with ptracegrpc package in it. Clients using only ptrace won't get ptracegrpc dependancies starting from go 1.17 if I understand lazy loading correctly

@bogdandrutu
Copy link
Member Author

@tigrannajaryan @dmitryax

ptracegrpc is not a good name, since the current otlpgrpc contains indeed the grpc stubs, but also the Request/Response objects which are used also by the HTTP protocol.

@bogdandrutu
Copy link
Member Author

Maybe consider ptraceotlp for what is in the otlpgrpc right now?

@dmitryax
Copy link
Member

dmitryax commented Apr 5, 2022

Maybe consider ptraceotlp for what is in the otlpgrpc right now?

Maybe use ptraceotlp for both otlp and otlpgrpc then?

@bogdandrutu
Copy link
Member Author

bogdandrutu commented Apr 6, 2022

@tigrannajaryan what does otlp stands for? :))))

Maybe use ptraceotlp for both otlp and otlpgrpc then?

@dmitryax why that? did we not agree that the marshaller/unmarshaller are simple things and can be seen as encodings for the "in memory representation" as protobuf or as json.

@tigrannajaryan
Copy link
Member

@tigrannajaryan what does otlp stands for? :))))

Hmm, I can't remember anymore :-)

--

I like ptraceotlp or potlptrace or traceotlp or otlptrace as a sub-package name in ptrace module.

@dmitryax
Copy link
Member

dmitryax commented Apr 6, 2022

@dmitryax why that? did we not agree that the marshaller/unmarshaller are simple things and can be seen as encodings for the "in memory representation" as protobuf or as json.

I was thinking that it's called otpl for a reason :) but looks like you're right, I think it's good to move otlp in p<signal>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion-needed Community discussion needed priority:p1 High release:required-for-ga Must be resolved before GA release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants