Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[connector/servicegraph] The servicegraph supports nodes of the message queue middleware type #30856

Open
fyuan1316 opened this issue Jan 30, 2024 · 11 comments
Assignees
Labels

Comments

@fyuan1316
Copy link
Contributor

fyuan1316 commented Jan 30, 2024

Component(s)

connector/servicegraph

Is your feature request related to a problem? Please describe.

In the current implementation of the servicegraph, I found that the generated metrics ignore the message middleware when requested via the message middleware type (client and server in the metrics only mark producer and consumer services), which isn't an intuitive representation, and I'd prefer that the generated metrics support the visualization of the message middleware nodes in the graph.

Describe the solution you'd like

In the scenario where services are produced and consumed through the message queue middleware, the final graph only shows the invocation relationship between the producer and the consumer. It hides the message middleware node. This is intuitively no different from the use case where the client requests the server directly.

I think there may be the following benefits if we show message middleware type nodes:

  • The topology of the diagram is closer to reality and the original asynchronous call relationship is shown more clearly. The producer service is only responsible for writing data to the message middleware and the consumer is only responsible for reading messages from the message middleware.
  • In the edge scenario, if the message is not consumed within the timeout period, the current indicator is discarded. If we use a similar approach for DB requests, this call information from the producer service to the message middleware is preserved.

In conclusion, I would suggest to display the message middleware directly on the service graph, and as far as the technical implementation is concerned, using a similar approach to the way current DB requests are handled seems to help us achieve this.

What do you guys think?


(To give an example of rabbitmq, below is a graph of the current state vs. the state after the requirements are implemented)

Now

image

Then

image

Describe alternatives you've considered

No response

Additional context

As this component is based on [Grafana Tempo's service graph processor]
grafana/tempo#3348

@fyuan1316 fyuan1316 added enhancement New feature or request needs triage New item requiring triage labels Jan 30, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@JaredTan95
Copy link
Member

I think this is a very meaningful enhancement.

@JaredTan95 JaredTan95 removed the needs triage New item requiring triage label Feb 1, 2024
@jpkrohling
Copy link
Member

jpkrohling commented Feb 13, 2024

@mapno , what do you think?

@mapno
Copy link
Contributor

mapno commented Feb 20, 2024

Agree this would be a nice addition. It's been discussed in the past, but never implemented.

This requires changes in the connector and in the visualisations (AFAIK, Grafana). For the connector, an attribute that contains the messaging system's name or other identifier could be easily added. The OTel spec has semantic conventions for these attributes. Then, it's a matter of representing this new node in the Graph in Grafana.

@cbos
Copy link

cbos commented Feb 29, 2024

I have been testing with JMS messaging between 2 services and face the same issue.

There is slightly 'complicating' factor:
https://opentelemetry.io/docs/languages/java/automatic/configuration/#capturing-consumer-message-receive-telemetry-in-messaging-instrumentations

So both producer and consumer have there own trace. There is no parent-child relation.
At first all the spans for this were dropped.

Then I changed the config to use virtualnodes.

connectors:
  servicegraph:
    store:
      ttl: 10s
      max_items: 10000
    virtual_node_peer_attributes: [messaging.destination.name, db.name, net.sock.peer.addr, net.peer.name, rpc.service, net.sock.peer.name, net.peer.name, http.url, http.target]

messaging.destination.name is attribute which can be used, in the same way as db.name etc.
https://opentelemetry.io/docs/specs/semconv/messaging/messaging-spans/#messaging-attributes

That works for the producer side.

But virtualnodes is not implemented for consumer side.
Otherwise the servicegraph would have been as mentioned with a virtual node in between with the queue name.

With JMS you can implement fire-and-forget. Then it has the same problem as database, the corresponding span will never come. If the message is pickup up later (or never) the corresponding 'child' will never come.
In the code there is an exception for databases, as that is forseen that a database does not create a span.

// A database request will only have one span, we don't wait for the server

But in case of messaging systems, that is the same.
If feels a bit weird that only database has this special place.

I think the situation of the database and a messaging system can be quite similar.

Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Apr 30, 2024
@t00mas
Copy link
Contributor

t00mas commented May 7, 2024

I agree this can be a good addition, I can see what can be done - please assign

@t00mas
Copy link
Contributor

t00mas commented May 14, 2024

@fyuan1316 I've been following related issues and reading your feedback on the additional context linked, I understand this is where we're at currently:

  • messaging.system, or any other messaging.* dimensions can be added as labels with client_ / server_ prefixes already
  • connection_type will be set as messaging_system

However, none of that will create a new node in the service graph as you initially suggested, and I see the conversation here shifted away from this component for that feature.

Are there any other metric generation suggestions that should be taken into consideration, here or there?

@fyuan1316
Copy link
Contributor Author

Hi @t00mas,
This PR grafana/tempo#3453 introduces a new metric traces_service_graph_request_messaging_system_seconds for messaging system, which has recently been merged. I wonder if this will help?

@t00mas
Copy link
Contributor

t00mas commented May 15, 2024

I think this is doable, let me see if I can draft a PR for this to sit behind a feature gate.

Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Jul 15, 2024
@Frapschen Frapschen removed the Stale label Aug 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants