Skip to content

Global counters #3045

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from
Closed

Global counters #3045

wants to merge 9 commits into from

Conversation

dcorbacho
Copy link
Contributor

Introduces global counters using the new seshat library.
Also adds some missing metrics to the stream plugin.

Requires rabbitmq/osiris#30 & rabbitmq/ra#221

Types of Changes

  • Bug fix (non-breaking change which fixes issue #NNNN)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause an observable behavior change in existing systems)
  • Documentation improvements (corrections, new content, etc)
  • Cosmetic change (whitespace, formatting, etc)

Checklist

  • I have read the CONTRIBUTING.md document
  • I have signed the CA (see https://cla.pivotal.io/sign/rabbitmq)
  • All tests pass locally with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)
  • Any dependent changes have been merged and published in related repositories

kjnilsson and others added 8 commits May 18, 2021 14:14
This demonstrated how the new global counters could be used with
rabbitmq_prometheus. This requires us to build new versions of the
existing dashboards as we are introducing new metrics which work with
Streams and other queue types (some are currently in the design phase).

Pairs @dcorbacho @kjnilsson @mkuratczyk

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
@dcorbacho dcorbacho requested review from gerhard and kjnilsson May 18, 2021 12:28
gerhard added a commit that referenced this pull request May 20, 2021
This has gone through a few cycles, with @mkuratczyk & @dcorbacho
covering most of the ground. @dcorbacho had most of this in
#3045, but the main
branch went through a few changes in the meantime. Rather than resolving
all the conflicts, and then making the necessary changes, we (@gerhard +
@kjnilsson) took all learnings and started re-applying a lot of the
existing code from #3045. We are confident in this approach and would
like to see it through.

Pair @kjnilsson

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
gerhard added a commit that referenced this pull request May 28, 2021
This has gone through a few cycles, with @mkuratczyk & @dcorbacho
covering most of the ground. @dcorbacho had most of this in
#3045, but the main
branch went through a few changes in the meantime. Rather than resolving
all the conflicts, and then making the necessary changes, we (@gerhard +
@kjnilsson) took all learnings and started re-applying a lot of the
existing code from #3045. We are confident in this approach and would
like to see it through.

Pair @kjnilsson

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
gerhard added a commit that referenced this pull request Jun 1, 2021
This has gone through a few cycles, with @mkuratczyk & @dcorbacho
covering most of the ground. @dcorbacho had most of this in
#3045, but the main
branch went through a few changes in the meantime. Rather than resolving
all the conflicts, and then making the necessary changes, we (@gerhard +
@kjnilsson) took all learnings and started re-applying a lot of the
existing code from #3045. We are confident in this approach and would
like to see it through.

Pair @kjnilsson

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
gerhard added a commit that referenced this pull request Jun 14, 2021
This has gone through a few cycles, with @mkuratczyk & @dcorbacho
covering most of the ground. @dcorbacho had most of this in
#3045, but the main
branch went through a few changes in the meantime. Rather than resolving
all the conflicts, and then making the necessary changes, we (@gerhard +
@kjnilsson) took all learnings and started re-applying a lot of the
existing code from #3045. We are confident in this approach and would
like to see it through.

Expose global metrics in rabbitmq_prometheus. We don't want to keep
modifying the existing collector, but start with a new namespace,
rabbitmq_global_, and continue building on top of it. The idea is to
build in parallel, and slowly transition to the new metrics, because
semantically the changes are too big since streams, and we have been
discussing protocol-specific metrics with @kjnilsson, which makes me
think that this approach is least disruptive and... simple.

While at this, we removed redundant empty return value handling in the
channel. The function called no longer returns this.

Also removed all DONE / TODO & other comments - we'll handle them when
the time comes, no need to leave TODO reminders.

Pair @kjnilsson

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
gerhard added a commit that referenced this pull request Jun 16, 2021
This way we can show how many messages were received via a
certain protocol (stream is the second real protocol besides the default
amqp091 one), as well as by queue type, which is something that many
asked for a really long time.

The most important aspect is that we can also see them by protocol AND
queue_type, which becomes very important for Streams, which have
different rules from regular queues (e.g. for example, consuming
messages is non-destructive, and deep queue backlogs - think billions of
messages - are normal). Alerting and consumer scaling due to deep
backlogs will now work correctly, as we can distinguish between regular
queues & streams.

This has gone through a few cycles, with @mkuratczyk & @dcorbacho
covering most of the ground. @dcorbacho had most of this in
#3045, but the main
branch went through a few changes in the meantime. Rather than resolving
all the conflicts, and then making the necessary changes, we (@gerhard +
@kjnilsson) took all learnings and started re-applying a lot of the
existing code from #3045. We are confident in this approach and would
like to see it through.

We expose these global counters in rabbitmq_prometheus via a new
collector. We don't want to keep modifying the existing collector, which
grew really complex in parts, especially since we introduced
aggregation, but start with a new namespace, rabbitmq_global_, and
continue building on top of it. The idea is to build in parallel, and
slowly transition to the new metrics, because semantically the changes
are too big since streams, and we have been discussing protocol-specific
metrics with @kjnilsson, which makes me think that this approach is
least disruptive and... simple.

While at this, we removed redundant empty return value handling in the
channel. The function called no longer returns this.

Also removed all DONE / TODO & other comments - we'll handle them when
the time comes, no need to leave TODO reminders.

Pairs @kjnilsson @dcorbacho
(this is multiple commits squashed into one)

Next steps:
- Create new PR and ask @mkuratcyzk and @ansd for review - fresh 👀 👀
- This new PR closes #3045
- Back-port to 3.9.x as is (to my knowledge, this is the only feature
  missing before we can code freeze and cut an RC)
- Back-port parts of this to 3.8.x so that we can finally address
  #2783
- Fix & publish new version of the RabbitMQ-Overview Grafana dashboard
- Fix & finally publish the new RabbitMQ-Streams Grafana dashboard

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
gerhard added a commit that referenced this pull request Jun 21, 2021
This way we can show how many messages were received via a
certain protocol (stream is the second real protocol besides the default
amqp091 one), as well as by queue type, which is something that many
asked for a really long time.

The most important aspect is that we can also see them by protocol AND
queue_type, which becomes very important for Streams, which have
different rules from regular queues (e.g. for example, consuming
messages is non-destructive, and deep queue backlogs - think billions of
messages - are normal). Alerting and consumer scaling due to deep
backlogs will now work correctly, as we can distinguish between regular
queues & streams.

This has gone through a few cycles, with @mkuratczyk & @dcorbacho
covering most of the ground. @dcorbacho had most of this in
#3045, but the main
branch went through a few changes in the meantime. Rather than resolving
all the conflicts, and then making the necessary changes, we (@gerhard +
@kjnilsson) took all learnings and started re-applying a lot of the
existing code from #3045. We are confident in this approach and would
like to see it through.

We expose these global counters in rabbitmq_prometheus via a new
collector. We don't want to keep modifying the existing collector, which
grew really complex in parts, especially since we introduced
aggregation, but start with a new namespace, rabbitmq_global_, and
continue building on top of it. The idea is to build in parallel, and
slowly transition to the new metrics, because semantically the changes
are too big since streams, and we have been discussing protocol-specific
metrics with @kjnilsson, which makes me think that this approach is
least disruptive and... simple.

While at this, we removed redundant empty return value handling in the
channel. The function called no longer returns this.

Also removed all DONE / TODO & other comments - we'll handle them when
the time comes, no need to leave TODO reminders.

Pairs @kjnilsson @dcorbacho
(this is multiple commits squashed into one)

Next steps:
- Create new PR and ask @mkuratcyzk and @ansd for review - fresh 👀 👀
- This new PR closes #3045
- Back-port to 3.9.x as is (to my knowledge, this is the only feature
  missing before we can code freeze and cut an RC)
- Back-port parts of this to 3.8.x so that we can finally address
  #2783
- Fix & publish new version of the RabbitMQ-Overview Grafana dashboard
- Fix & finally publish the new RabbitMQ-Streams Grafana dashboard

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
@gerhard
Copy link
Contributor

gerhard commented Jun 21, 2021

Superseded by #3127

@gerhard gerhard closed this Jun 21, 2021
@gerhard gerhard deleted the global-counters branch June 21, 2021 16:49
gerhard added a commit that referenced this pull request Jun 21, 2021
This way we can show how many messages were received via a certain
protocol (stream is the second real protocol besides the default amqp091
one), as well as by queue type, which is something that many asked for a
really long time.

The most important aspect is that we can also see them by protocol AND
queue_type, which becomes very important for Streams, which have
different rules from regular queues (e.g. for example, consuming
messages is non-destructive, and deep queue backlogs - think billions of
messages - are normal). Alerting and consumer scaling due to deep
backlogs will now work correctly, as we can distinguish between regular
queues & streams.

This has gone through a few cycles, with @mkuratczyk & @dcorbacho
covering most of the ground. @dcorbacho had most of this in
#3045, but the main
branch went through a few changes in the meantime. Rather than resolving
all the conflicts, and then making the necessary changes, we (@gerhard +
@kjnilsson) took all learnings and started re-applying a lot of the
existing code from #3045. We are confident in this approach and would
like to see it through. We continued working on this with @dumbbell, and
the most important changes are captured in
rabbitmq/seshat#1.

We expose these global counters in rabbitmq_prometheus via a new
collector. We don't want to keep modifying the existing collector, which
grew really complex in parts, especially since we introduced
aggregation, but start with a new namespace, `rabbitmq_global_`, and
continue building on top of it. The idea is to build in parallel, and
slowly transition to the new metrics, because semantically the changes
are too big since streams, and we have been discussing protocol-specific
metrics with @kjnilsson, which makes me think that this approach is
least disruptive and... simple.

While at this, we removed redundant empty return value handling in the
channel. The function called no longer returns this.

Also removed all DONE / TODO & other comments - we'll handle them when
the time comes, no need to leave TODO reminders.

Pairs @kjnilsson @dcorbacho @dumbbell
(this is multiple commits squashed into one)

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
gerhard added a commit that referenced this pull request Jun 22, 2021
This way we can show how many messages were received via a certain
protocol (stream is the second real protocol besides the default amqp091
one), as well as by queue type, which is something that many asked for a
really long time.

The most important aspect is that we can also see them by protocol AND
queue_type, which becomes very important for Streams, which have
different rules from regular queues (e.g. for example, consuming
messages is non-destructive, and deep queue backlogs - think billions of
messages - are normal). Alerting and consumer scaling due to deep
backlogs will now work correctly, as we can distinguish between regular
queues & streams.

This has gone through a few cycles, with @mkuratczyk & @dcorbacho
covering most of the ground. @dcorbacho had most of this in
#3045, but the main
branch went through a few changes in the meantime. Rather than resolving
all the conflicts, and then making the necessary changes, we (@gerhard +
@kjnilsson) took all learnings and started re-applying a lot of the
existing code from #3045. We are confident in this approach and would
like to see it through. We continued working on this with @dumbbell, and
the most important changes are captured in
rabbitmq/seshat#1.

We expose these global counters in rabbitmq_prometheus via a new
collector. We don't want to keep modifying the existing collector, which
grew really complex in parts, especially since we introduced
aggregation, but start with a new namespace, `rabbitmq_global_`, and
continue building on top of it. The idea is to build in parallel, and
slowly transition to the new metrics, because semantically the changes
are too big since streams, and we have been discussing protocol-specific
metrics with @kjnilsson, which makes me think that this approach is
least disruptive and... simple.

While at this, we removed redundant empty return value handling in the
channel. The function called no longer returns this.

Also removed all DONE / TODO & other comments - we'll handle them when
the time comes, no need to leave TODO reminders.

Pairs @kjnilsson @dcorbacho @dumbbell
(this is multiple commits squashed into one)

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
gerhard added a commit that referenced this pull request Jun 22, 2021
This way we can show how many messages were received via a certain
protocol (stream is the second real protocol besides the default amqp091
one), as well as by queue type, which is something that many asked for a
really long time.

The most important aspect is that we can also see them by protocol AND
queue_type, which becomes very important for Streams, which have
different rules from regular queues (e.g. for example, consuming
messages is non-destructive, and deep queue backlogs - think billions of
messages - are normal). Alerting and consumer scaling due to deep
backlogs will now work correctly, as we can distinguish between regular
queues & streams.

This has gone through a few cycles, with @mkuratczyk & @dcorbacho
covering most of the ground. @dcorbacho had most of this in
#3045, but the main
branch went through a few changes in the meantime. Rather than resolving
all the conflicts, and then making the necessary changes, we (@gerhard +
@kjnilsson) took all learnings and started re-applying a lot of the
existing code from #3045. We are confident in this approach and would
like to see it through. We continued working on this with @dumbbell, and
the most important changes are captured in
rabbitmq/seshat#1.

We expose these global counters in rabbitmq_prometheus via a new
collector. We don't want to keep modifying the existing collector, which
grew really complex in parts, especially since we introduced
aggregation, but start with a new namespace, `rabbitmq_global_`, and
continue building on top of it. The idea is to build in parallel, and
slowly transition to the new metrics, because semantically the changes
are too big since streams, and we have been discussing protocol-specific
metrics with @kjnilsson, which makes me think that this approach is
least disruptive and... simple.

While at this, we removed redundant empty return value handling in the
channel. The function called no longer returns this.

Also removed all DONE / TODO & other comments - we'll handle them when
the time comes, no need to leave TODO reminders.

Pairs @kjnilsson @dcorbacho @dumbbell
(this is multiple commits squashed into one)

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
gerhard added a commit that referenced this pull request Jun 22, 2021
This way we can show how many messages were received via a certain
protocol (stream is the second real protocol besides the default amqp091
one), as well as by queue type, which is something that many asked for a
really long time.

The most important aspect is that we can also see them by protocol AND
queue_type, which becomes very important for Streams, which have
different rules from regular queues (e.g. for example, consuming
messages is non-destructive, and deep queue backlogs - think billions of
messages - are normal). Alerting and consumer scaling due to deep
backlogs will now work correctly, as we can distinguish between regular
queues & streams.

This has gone through a few cycles, with @mkuratczyk & @dcorbacho
covering most of the ground. @dcorbacho had most of this in
#3045, but the main
branch went through a few changes in the meantime. Rather than resolving
all the conflicts, and then making the necessary changes, we (@gerhard +
@kjnilsson) took all learnings and started re-applying a lot of the
existing code from #3045. We are confident in this approach and would
like to see it through. We continued working on this with @dumbbell, and
the most important changes are captured in
rabbitmq/seshat#1.

We expose these global counters in rabbitmq_prometheus via a new
collector. We don't want to keep modifying the existing collector, which
grew really complex in parts, especially since we introduced
aggregation, but start with a new namespace, `rabbitmq_global_`, and
continue building on top of it. The idea is to build in parallel, and
slowly transition to the new metrics, because semantically the changes
are too big since streams, and we have been discussing protocol-specific
metrics with @kjnilsson, which makes me think that this approach is
least disruptive and... simple.

While at this, we removed redundant empty return value handling in the
channel. The function called no longer returns this.

Also removed all DONE / TODO & other comments - we'll handle them when
the time comes, no need to leave TODO reminders.

Pairs @kjnilsson @dcorbacho @dumbbell
(this is multiple commits squashed into one)

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
(cherry picked from commit c797125)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants