Skip to content

Add RabbitMQ support #1328

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 65 commits into from
Dec 22, 2020
Merged

Add RabbitMQ support #1328

merged 65 commits into from
Dec 22, 2020

Conversation

hectorespert
Copy link
Contributor

@hectorespert hectorespert commented Aug 9, 2020

What does this PR do?

RabbitMQ support based on the RocketMQ support

Checklist

  • This is a new plugin
    • I have updated CHANGELOG.asciidoc
    • My code follows the style guidelines of this project
    • I have made corresponding changes to the documentation
    • I have added tests that prove my fix is effective or that my feature works
    • New and existing unit tests pass locally with my changes
    • I have updated supported-technologies.asciidoc
    • n/a Added an API method or config option? Document in which version this will be introduced
    • Added an instrumentation plugin? Describe how you made sure that old, non-supported versions are not instrumented by accident.
      -> Testing the instrumentation with old versions of the RabbitMQ client

@cla-checker-service
Copy link

cla-checker-service bot commented Aug 9, 2020

💚 CLA has been signed

@apmmachine
Copy link
Contributor

apmmachine commented Aug 9, 2020

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: Started by user sylvainjuge

  • Start Time: 2020-12-22T13:50:10.765+0000

  • Duration: 44 min 8 sec

Test stats 🧪

Test Results
Failed 0
Passed 1677
Skipped 12
Total 1689

@felixbarny
Copy link
Member

Thanks for your PR ❤️

We're currently migrating plugins from HelperClassManager to indy plugins. See also #1329 for more context.

The new indy plugins should make it much more pleasant to write plugins as you don't need those helper interfaces and you can just reference both agent classes and RabbitMQ classes from anywhere in your code.

Would you mind migrating your plugin to be an indy plugin?

I noticed you marked this as a draft PR. What's still missing in your view? Which parts do you specifically want to get feedback on?

@hectorespert
Copy link
Contributor Author

hectorespert commented Aug 11, 2020

Would you mind migrating your plugin to be an indy plugin?

I will try it.

@hectorespert
Copy link
Contributor Author

hectorespert commented Aug 11, 2020

I noticed you marked this as a draft PR. What's still missing in your view? Which parts do you specifically want to get feedback on?

Consumer instrumentation seems that are working, but producer instrumentation isn't work and I can't do to it pass the tests.
I don't know if I do something wrong or I need to add something in the exit span.

@hectorespert
Copy link
Contributor Author

hectorespert commented Sep 8, 2020

I migrated the plugin to be an indy plugin, but I have a problem in the Producer instrumentation.

java.lang.IllegalStateException: Cannot define writable field access for com.rabbitmq.client.AMQP$BasicProperties arg1 when using delegation

How can I modify a method field in the indy plugins? There is any example?

@hectorespert hectorespert changed the title Add RabbitMQ support WIP Add RabbitMQ support Sep 8, 2020
@felixbarny
Copy link
Member

You can use @AssignTo.Field

@hectorespert
Copy link
Contributor Author

You can use @AssignTo.Field

Thanks. It was fixed using your advice.

@felixbarny
Copy link
Member

using your advice

Pun intended?

@hectorespert
Copy link
Contributor Author

using your advice

Pun intended?

No pun intended

@hectorespert hectorespert marked this pull request as ready for review September 9, 2020 21:53
@SylvainJuge
Copy link
Member

I've opened a PR against your fork as it was simpler to do the changes directly to avoid back & forth: https://github.com/hectorespert/apm-agent-java/pull/1

You really did an awesome job here, the changes are mostly details and the instrumentation worked really well !!

@fgabolde
Copy link

fgabolde commented Dec 3, 2020

I'm interested in RMQ support (the one last big part of our stack that apm-agent-java didn't support! thanks to all involved) but I'm a little wary of using the routing key in transaction names, as I understand this PR is doing (if I misunderstood, sorry about that, and you can safely discard the rest of this comment). IMHO, this should be part of metadata, like HTTP transaction URLs are.

The reason is that routing keys can be extremely high cardinality, e.g. user IDs. The RMQ exchanges can bind user.# to deliver all user IDs however they want; this is how the various sharding plugins work, for instance.

@SylvainJuge
Copy link
Member

Thanks for that feedback @fgabolde ! We often miss some real-world usage feedback for all the frameworks & libraries that we instrument, thus it's more than welcome !

In the current version we currently include exchange & routing key as part of span and transactions naming as you can find in the tests code for example:

        assertThat(span.getNameAsString())
            .isEqualTo("RabbitMQ SEND to %s/%s", exchange, routingKey);

There are multiple ways to make it fit high-cardinality routing keys, which might get added in a follow-up/improvement PR, for example by providing an option to not include those in transaction names (only keeping queue/exchange).

Also, while this is a possible usage of routing keys, would it be something that would prevent you from using it as-is ?

Anyone is invited to add a 👍 (if it's fine) or 👎 (if it's not) on this comment so we have feedback if that's a blocking thing that we should address right now or we can do it in a follow-up PR.

@fgabolde
Copy link

fgabolde commented Dec 3, 2020

@SylvainJuge, I think you're right that the feature as implemented in this PR is already useful for some of my use cases. E.g. I have several services consuming task queues from RMQ where the routing key is not really interesting, so we mostly do direct exchanges and the routing key never varies (all the interesting info is in the payload). These services also tend to be ones that would most benefit from this feature; since the message is the point of entry and generates a single tree of spans, it's often the most interesting span.

I'm thinking of some other services that would not be suitable though. They consume large amounts of messages from sharded queues, so the routing key varies wildly. Then again, they tend to do things that don't work well with distributed tracing, like batching or reordering messages, so a given message is not the only one responsible for any other span we might generate. E.g. we might gather a batch of 100 messages, filter out some of them, process them somewhat, and bulk-write everything to a database in a single insert query. I'm not sure how Elastic APM (or any other similar tool) can help me here, to be honest. If you have any tips I'm very interested.

All in all, I do think that having the routing key in names can be useful, but it feels odd that it's the opposite of the HTTP use case for instance, where the transaction/span name is usually a method name (so cardinality is lower). E.g. for an HTTP service, the root span will be something like FooControllerName#getFoo; you wouldn't put foo IDs there. That's what the elastic.apm.use_path_as_transaction_name option is for, but it's disabled by default, which makes sense to me. It would be more consistent if RMQ support was similar (low cardinality by default, with an option to bring back the details in the transaction names if necessary).

@eyalkoren
Copy link
Contributor

@fgabolde just to add a bit of context: in other messaging systems, we use topic or queue names and so far it seems to resonate with users. In RabbitMQ, we are not so sure the queue name is meaningful always and I am not sure it is accessible in the receiver side (although I have to admit I did not put a lot of effort investigating that). On the other hand, we are worried that the exchange names would provide too-low cardinality. Therefore I suggested using the routing key in combination with the exchange, but I was not aware of using it as you mention, e.g. for sharding.
Ideally, what would you define as the best cardinality-factor for RabbitMQ - the exchange or the queue?

@SylvainJuge the problem with very high cardinality of transaction names is that it makes a lot of our UI much less useable, as all transaction summaries are based on that, so we should avoid this default experience OOTB as much as possible. I think it is much better to err on the opposite side, meaning - providing too-low cardinality. And it may be the case that even allowing opt-in on routing key doesn't make sense in most cases.

@fgabolde
Copy link

fgabolde commented Dec 3, 2020

@fgabolde just to add a bit of context: in other messaging systems, we use topic or queue names and so far it seems to resonate with users. In RabbitMQ, we are not so sure the queue name is meaningful always and I am not sure it is accessible in the receiver side (although I have to admit I did not put a lot of effort investigating that). On the other hand, we are worried that the exchange names would provide too-low cardinality. Therefore I suggested using the routing key in combination with the exchange, but I was not aware of using it as you mention, e.g. for sharding.
Ideally, what would you define as the best cardinality-factor for RabbitMQ - the exchange or the queue?

I'm not too experienced with other messaging systems, but AFAIK in Kafka for example, there is no advanced routing as there is in RMQ. You publish to a topic by name and that's it. I've seen some discussion around routing but it seems to be about creating sub-topics behind the scenes? (Not to diss on Kafka -- I'd love to get some replayability in some of our message queues...) Likewise, when receiving events, you subscribe to a topic by name. In both sides the names are well-known and, barring pathological configurations, low cardinality.

In RMQ on the other hand, you publish to an exchange a message with a routing key and a body (and some other optional stuff). The only well-known piece of data when publishing is the exchange name; as I mentioned in my first comment the routing key may be low cardinality as well, but it might not be. The exchange then decides, based on its type and its configured bindings, which queue(s) the message will end up in. You have a lot of possibilities here, which makes guessing the final queue(s) impossible unless you also know the server configuration: directly to a queue named like the routing key? to some different queue, based on the routing key? ignore the routing key and use headers instead? Anyway, when publishing you only know the exchange name and the message, so the exchange is probably the best describer here. The routing key's cardinality is completely unknown, it really depends on the application's architecture. User IDs in routing keys are of course realistic for sharding, but not everybody uses routing keys for sharding. You might use them for plain topic subscription, in which case the cardinality can vary from low (like log severity) to high-ish (like postal codes).

When receiving a message however, you ignore exchanges completely and you read from a named queue (this may get tricky for sharding btw, since shards are implemented as queues; you have to find some way to decide which consumer reads from which queue). The queue is probably the best describer here. Fun!

As a last note for clarifying the RMQ architecture further: queues don't belong to exchanges. Multiple exchanges can be setup to route some or all of their incoming messages to the same queue. However, exchanges and queues both belong to vhosts, which is now that I think about it an interesting piece of info to add to spans because you can have multiple exchanges or queues with the same name, as long as they live in different vhosts...

@eyalkoren
Copy link
Contributor

@fgabolde I forgot to thank you about your detailed explanation - it is super useful for us, thanks a lot for taking the time.

We eventually based our approach greatly based on your feedback and we will see how it goes:

  • Transactions and spans will be named with exchange or queue if we have it (i.e. receiver).
  • We will have a new field for the routing key and we may add a config in the future to opt in to append it to the name as well.
  • In the existing queue.name field, we will set the queue name if available or the exchange otherwise.
  • In the context.destination.service.resource field we want to make sure sender and receiver set the same, so we will use exchange name in both.

Copy link
Contributor

@eyalkoren eyalkoren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done, we are there 👍
Two minor deviations from the spec and a small bug with resource naming.
Other than that - minor suggestions.
If you disagree (meaning - I got something wrong), just ignore.

// since exchange name is only known when receiving the message, we might have to discard it
if (isIgnored(envelope.getExchange())) {
span.requestDiscarding();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
}
}
exchange = envelope.getExchange();

You never set the exchange on poll spans' destination.resource.

Fixing this does not cause any test failure, so it is not tested as well. I think it is because it is tested with empty exchange or ignore queues only. The test of polling within transaction should contain the exchange.

For RabbitMQ:

- `context.message.queue.name` field will contain queue name when using polling, exchange name otherwise.
- `context.message.destination.resource` field will contain `rabbitmq` for the default exchange and `rabbitmq/XXX` for named exchanges
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always with the exchange, normalized when it is the default

@SylvainJuge SylvainJuge merged commit 4053baf into elastic:master Dec 22, 2020
@hectorespert hectorespert deleted the rabbitmq branch January 11, 2021 08:17
@fgabolde
Copy link

I meant to thank you for this feature by the way!

We upgraded one of our services to 1.20 and I can see the lovely outgoing traces. Going to upgrade another and see if the propagation works.

👍

@felixbarny
Copy link
Member

Great to hear, thanks for the Feedback!

Be sure to let us know how it goes when updating other services 🙂

@fgabolde
Copy link

It works perfectly!

(Almost perfectly. I had to tweak my Spring setup for the full effect, initially I only had the RECEIVE span and no others. For posterity: when using Spring AMQP's SimpleMessageListenerContainer, I don't really know what multithreading facility is used behind the scenes, but it doesn't seem to be supported by the agent. Switching to DirectMessageListenerContainer fixed the issue and now I have all my spans. Documentation here)

@felixbarny
Copy link
Member

Thanks for reporting, I've created a follow-up issue: #1635

@eyalkoren
Copy link
Contributor

@fgabolde #1657 is adding proper support for Spring AMQP message handling API. If possible and still relevant, please try this snapshot with SimpleMessageListenerContainer to verify it works as expected.

@fgabolde
Copy link

OK, thanks! I will take a look as soon as I am able.

@fgabolde
Copy link

@eyalkoren The MR you linked has been merged so the Jenkins pipeline has disappeared; that's my fault for taking so long I guess. I got the latest build on master which I assume contains the relevant changes, I attached it to my RMQ-using app, and I switched back to SimpleMessageListenerContainer, but now I have no RMQ spans at all :(

I guess I did something wrong, but I'm not sure where, so I will keep looking around.

@felixbarny
Copy link
Member

The snapshot deployment failed yesterday. So depending on when you downloaded the snapshot, the changes may not be included. Try downloading again now.

@fgabolde
Copy link

I just got the agent from this build https://apm-ci.elastic.co/blue/organizations/jenkins/apm-agent-java%2Fapm-agent-java-mbp/detail/master/813/artifacts (yday 7:47 PM UTC), still the same thing. I can see the RMQ span in the sender application but the receiver has nothing at all, while before this change I would have a RECEIVE span at least.

I'm pretty sure I misconfigured something, but I still don't know what. I will try building the app with the Simple container and the 1.20 agent and at least confirm if I have the old behavior back (if not, the issue is definitely on my side).

@eyalkoren
Copy link
Contributor

@fgabolde sorry to hear this didn't work out smoothly for you 🙁
We just released 1.22.0, which contains this fix. We still consider this as a fix because @kananindzya was able to reproduce and make it work in #1657 , but if your use case is not supported properly yet, let us know and we'll try to see why.
If you can't figure it out, please provide details about your setup, the agent configurations you use and a debug log so we can look into this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants