-
Notifications
You must be signed in to change notification settings - Fork 593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rpk: add rpk debug remote-bundle
; collect a cluster-wide bundle
#23986
Conversation
non flaky failures in https://buildkite.com/redpanda/redpanda/builds/57483#0192e917-5e91-479f-8c4c-ab61be9a455b:
non flaky failures in https://buildkite.com/redpanda/redpanda/builds/57483#0192e945-a73d-4fa9-8cce-8cec8bf9bc8a:
non flaky failures in https://buildkite.com/redpanda/redpanda/builds/58586#01935535-e046-4f01-824d-6d44fde8f537:
non flaky failures in https://buildkite.com/redpanda/redpanda/builds/58586#01935535-e047-4325-a0d6-ce6629d58d5d:
non flaky failures in https://buildkite.com/redpanda/redpanda/builds/58586#01935535-e047-47c9-8964-6eb7305ce2e1:
non flaky failures in https://buildkite.com/redpanda/redpanda/builds/58586#01935539-a0d5-43a5-9a85-e64ac24cc400:
non flaky failures in https://buildkite.com/redpanda/redpanda/builds/58586#01935539-a0d5-4042-b034-25f13450ca49:
non flaky failures in https://buildkite.com/redpanda/redpanda/builds/58586#01935539-a0d6-4f6b-b276-7505301c6581:
|
Retry command for Build#57483please wait until all jobs are finished before running the slash command
|
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/57483#0192e917-5e8c-411e-bf42-1506a5294f90 |
// InstallFlags installs the debug bundle flags that fills the debug bundle | ||
// options. | ||
func (o *DebugBundleSharedOptions) InstallFlags(f *pflag.FlagSet) { | ||
f.StringVar(&o.ControllerLogsSizeLimit, "controller-logs-size-limit", "132MB", "The size limit of the controller logs that can be stored in the bundle (e.g. 3MB, 1GiB)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f.StringVar(&o.ControllerLogsSizeLimit, "controller-logs-size-limit", "132MB", "The size limit of the controller logs that can be stored in the bundle (e.g. 3MB, 1GiB)") | |
f.StringVar(&o.ControllerLogsSizeLimit, "controller-logs-size-limit", "132MB", "The size limit of the controller logs that can be stored in the bundle. For example: 3MB, 1GiB.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Except for adding the period as we don't add them to our flags help text which we found was a common pattern in other CLIs
f.StringVar(&o.ControllerLogsSizeLimit, "controller-logs-size-limit", "132MB", "The size limit of the controller logs that can be stored in the bundle (e.g. 3MB, 1GiB)") | ||
f.DurationVar(&o.CPUProfilerWait, "cpu-profiler-wait", 30*time.Second, "For how long to collect samples for the CPU profiler (e.g. 30s, 1.5m). Must be higher than 15s") | ||
f.StringVar(&o.LogsSizeLimit, "logs-size-limit", "100MiB", "Read the logs until the given size is reached (e.g. 3MB, 1GiB)") | ||
f.StringVar(&o.LogsSince, "logs-since", "yesterday", "Include logs dated from specified date onward; (journalctl date format: YYYY-MM-DD, 'yesterday', or 'today'). Refer to journalctl documentation for more options") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f.StringVar(&o.LogsSince, "logs-since", "yesterday", "Include logs dated from specified date onward; (journalctl date format: YYYY-MM-DD, 'yesterday', or 'today'). Refer to journalctl documentation for more options") | |
f.StringVar(&o.LogsSince, "logs-since", "yesterday", "Include logs dated from specified date onward. For example: journalctl date format: YYYY-MM-DD, 'yesterday', or 'today'. See the journalctl documentation for more options.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think of leaving just: journalctl date format
as is the only option, and not an example of how to pass the flag?
f.StringVar(&o.LogsSizeLimit, "logs-size-limit", "100MiB", "Read the logs until the given size is reached (e.g. 3MB, 1GiB)") | ||
f.StringVar(&o.LogsSince, "logs-since", "yesterday", "Include logs dated from specified date onward; (journalctl date format: YYYY-MM-DD, 'yesterday', or 'today'). Refer to journalctl documentation for more options") | ||
f.StringVar(&o.LogsUntil, "logs-until", "", "Include logs older than the specified date; (journalctl date format: YYYY-MM-DD, 'yesterday', or 'today'). Refer to journalctl documentation for more options") | ||
f.DurationVar(&o.MetricsInterval, "metrics-interval", 10*time.Second, "Interval between metrics snapshots (e.g. 30s, 1.5m)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f.DurationVar(&o.MetricsInterval, "metrics-interval", 10*time.Second, "Interval between metrics snapshots (e.g. 30s, 1.5m)") | |
f.DurationVar(&o.MetricsInterval, "metrics-interval", 10*time.Second, "Interval between metrics snapshots. For example: 30s, 1.5m.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
One comment though on e.g
vs For example
that is worth having into consideration:
We attempt to keep the flags help text as short as possible as some commands are too cramped, that's why we used both e.g
and ()
so it was 'easier' to spot the examples. take this one (rpk debug bundle
) as an example:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is one of the cases where the CLI style-guide conflicts with docs style-guide. In this case, keep consistent to your own style-guide. cc @micheleRP
f.StringVar(&o.LogsSince, "logs-since", "yesterday", "Include logs dated from specified date onward; (journalctl date format: YYYY-MM-DD, 'yesterday', or 'today'). Refer to journalctl documentation for more options") | ||
f.StringVar(&o.LogsUntil, "logs-until", "", "Include logs older than the specified date; (journalctl date format: YYYY-MM-DD, 'yesterday', or 'today'). Refer to journalctl documentation for more options") | ||
f.DurationVar(&o.MetricsInterval, "metrics-interval", 10*time.Second, "Interval between metrics snapshots (e.g. 30s, 1.5m)") | ||
f.IntVar(&o.MetricsSampleCount, "metrics-samples", 2, "Number of metrics samples to take (at the interval of --metrics-interval). Must be >= 2") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f.IntVar(&o.MetricsSampleCount, "metrics-samples", 2, "Number of metrics samples to take (at the interval of --metrics-interval). Must be >= 2") | |
f.IntVar(&o.MetricsSampleCount, "metrics-samples", 2, "Number of metrics samples to take (at the interval of --metrics-interval). Must be >= 2.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as the above on periods.
'rpk debug remote-bundle status' and download when is ready with | ||
'rpk debug remote-bundle download'. | ||
|
||
The flag '--no-confirm' can be used to avoid the confirmation prompt. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The flag '--no-confirm' can be used to avoid the confirmation prompt. | |
Use the flag '--no-confirm' to avoid the confirmation prompt. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
165b124
to
0db3d1d
Compare
@dotnwat Thanks! good catch. Indeed I added the Bazel changes to the last commit. Fixed 👍 |
metricsSampleCount int | ||
cpuProfilerWait time.Duration | ||
timeout time.Duration | ||
opts common.DebugBundleSharedOptions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this model, it would definitely make sense for topic stuff as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In your ducktape tests, for all asserts
, please provide a human readable error message
b053c72
to
afe8a82
Compare
|
Fixed in the last push: #23986 (comment), but Github didn't reset the review request.
/ci-repeat 3 |
/ci-repeat 3 |
These are options that are suitable for sharing with Debug Remote Bundle
afe8a82
to
fc3cd21
Compare
Note: The new test added had a successful 60x run: https://buildkite.com/redpanda/redpanda/builds/58607#_, I'm going to retry aa 60x run again after the normal CI passes. |
Fixes DEVEX-44 This commit introduces the rpk debug remote bundle command, which allows the user to request a debug bundle using the Admin API.
fc3cd21
to
b88a857
Compare
/ci-repeat 3 |
/backport v24.3.x |
This PR adds the new command:
rpk debug remote-bundle
which lets the user collects a set of debug bundles from each node in the cluster. It uses the Admin API to do so. In order to collect the bundle, we have created 4 new commands:rpk debug remote-bundle start
rpk debug remote-bundle download
rpk debug remote-bundle status
rpk debug remote-bundle cancel
Examples:
These are interactive-by-default commands, each interactive command has their respective
--no-confirm
flag to avoid confirmation prompts.rpk debug remote-bundle start
rpk debug remote-bundle download
rpk debug remote-bundle status
rpk debug remote-bundle cancel
Additional work will be added in the future to collect everything on a single command, and also allow the user to clean-up the current cluster.
Backports Required
Release Notes
Features
rpk debug remote-bundle
to gather a debug bundle from a remote cluster.