-
Notifications
You must be signed in to change notification settings - Fork 38
Refactoring ahead of multi-plot metrics docs #1996
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
mdlinville
wants to merge
7
commits into
main
Choose a base branch
from
DOCS-1909
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
6279f77
Refactoring ahead of multi-plot metrics docs
mdlinville 46e6430
Merge branch 'main' into DOCS-1909
mdlinville 6f2f8da
Refactoring ahead of multi-plot metrics docs
mdlinville 5a00cfc
Merge remote-tracking branch 'origin/DOCS-1909' into DOCS-1909
mdlinville 38361ba
Merge remote-tracking branch 'origin/main' into DOCS-1909
mdlinville 92ddab7
Noah and Dan's feedback, remove nonfunctional redirects
mdlinville 794ceb8
More of Noah's feedback that Github decided randomly to show
mdlinville File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -3,74 +3,51 @@ description: Visualize metrics, customize axes, and compare multiple lines on a | |
| title: Line plots overview | ||
| --- | ||
|
|
||
| Line plots show up by default when you plot metrics over time with `wandb.Run.log()`. Customize with chart settings to compare multiple lines on the same plot, calculate custom axes, and rename labels. | ||
| Line plots display by default for metrics logged with `wandb.Run.log()` over time. You can customize a line plot to compare multiple lines, calculate custom axes, and rename labels. | ||
|
|
||
| <Frame> | ||
| <img src="/images/app_ui/line_plot_example.png" alt="Line plot example" /> | ||
| </Frame> | ||
|
|
||
| ## Edit line plot settings | ||
| <Tip> | ||
| For [runs](/models/runs) that execute on [CoreWeave](https://coreweave.com) infrastructure, [CoreWeave Mission Control](https://www.coreweave.com/mission-control) monitors your compute infrastructure. If an error occurs, W&B populates infrastructure information onto your run's plots in your project's workspace. For details, see [Visualize CoreWeave infrastructure alerts](models/runs.#visualize-coreweave-infrastructure-alerts). | ||
| </Tip> | ||
|
|
||
| This section shows how to edit the settings for an individual line plot panel, all line plot panels in a section, or all line plot panels in a workspace. | ||
| ## Edit line plot settings | ||
|
|
||
| <Note> | ||
| If you'd like to use a custom x-axis, make sure it's logged in the same call to `wandb.Run.log()` that you use to log the y-axis. | ||
| </Note> | ||
| This section shows how to edit the settings for an individual line plot panel, all line plot panels in a section, or all line plot panels in a workspace. For comprehensive details about line plot settings, see [Line plot reference](/models/app/features/panels/line-plot/reference). | ||
|
|
||
| ### Individual line plot | ||
| A line plot's individual settings override the line plot settings for the section or the workspace. To customize a line plot: | ||
|
|
||
| 1. Hover your mouse over the panel, then click the gear icon. | ||
| 1. Within the drawer that appears, select a tab to edit its [settings](#line-plot-settings). | ||
| 1. Within the drawer that appears, select a tab to edit its settings. | ||
| 1. Click **Apply**. | ||
|
|
||
| #### Line plot settings | ||
| You can configure these settings for a line plot: | ||
|
|
||
| **Date**: Configure the plot's data-display details. | ||
| * **X axis**: Select the value to use for the X axis (defaults to **Step**). You can change the x-axis to **Relative Time** or select a custom axis based on values you log with W&B. You can also configure the X axis scale and range. | ||
| * **Relative Time (Wall)** is clock time since the process started, so if you started a run and resumed it a day later and logged something that would be plotted a 24hrs. | ||
| * **Relative Time (Process)** is time inside the running process, so if you started a run and ran for 10 seconds and resumed a day later that point would be plotted at 10s. | ||
| * **Wall Time** is minutes elapsed since the start of the first run on the graph. | ||
| * **Step** increments by default each time `wandb.Run.log()` is called, and is supposed to reflect the number of training steps you've logged from your model. | ||
| * **Y axis**: Select one or more y-axes from the logged values, including metrics and hyperparameters that change over time. You can also configure the X axis scale and range. | ||
| * **Point aggregation method**. Either **Random sampling** (the default) or **Full fidelity**. Refer to [Sampling](/models/app/features/panels/line-plot/sampling/). | ||
| * **Smoothing**: Change the smoothing on the line plot. Defaults to **Time weighted EMA**. Other values include **No smoothing**, **Running average**, and **Gaussian**. | ||
| * **Outliers**: Rescale to exclude outliers from the default plot min and max scale. | ||
| * **Max number of runs or groups**: Show more lines on the line plot at once by increasing this number, which defaults to 10 runs. You'll see the message "Showing first 10 runs" on the top of the chart if there are more than 10 runs available but the chart is constraining the number visible. | ||
| * **Chart type**: Change between a line plot, an area plot, and a percentage area plot. | ||
| Line plot settings are organized into tabs: | ||
| * **Data**: Configure x-axis, y-axis, sampling method, smoothing, outliers, and chart type. | ||
| * **Grouping**: Configure whether and how to group and aggregate runs in the plot. | ||
| * **Chart**: Specify titles for the panel and axes, and configure legend visibility and position. | ||
| * **Legend**: Customize the appearance and content of the panel's legend. | ||
| * **Expressions**: Add custom calculated expressions for the axes. | ||
|
|
||
| **Grouping**: Configure whether and how to group and aggregate runs in the plot. | ||
| * **Group by**: Select a column, and all the runs with the same value in that column will be grouped together. | ||
| * **Agg**: Aggregation— the value of the line on the graph. The options are mean, median, min, and max of the group. | ||
|
|
||
| **Chart**: Specify titles for the panel, the X axis, and the Y axis, and the -axis, hide or show the legend, and configure its position. | ||
|
|
||
| **Legend**: Customize the appearance of the panel's legend, if it is enabled. | ||
| * **Legend**: The field in the legend for each line in the plot in the legend of the plot for each line. | ||
| * **Legend template**: Define a fully customizable template for the legend, specifying exactly what text and variables you want to show up in the template at the top of the line plot as well as the legend that appears when you hover your mouse over the plot. | ||
|
|
||
| **Expressions**: Add custom calculated expressions to the panel. | ||
| * **Y Axis Expressions**: Add calculated metrics to your graph. You can use any of the logged metrics as well as configuration values like hyperparameters to calculate custom lines. | ||
| * **X Axis Expressions**: Rescale the x-axis to use calculated values using custom expressions. Useful variables include\*\*_step\*\* for the default x-axis, and the syntax for referencing summary values is `${summary:value}` | ||
| For detailed information about each setting, see the [Line plot reference](/models/app/features/panels/line-plot/reference). | ||
|
|
||
| ### All line plots in a section | ||
|
|
||
| To customize the default settings for all line plots in a section, overriding workspace settings for line plots: | ||
| 1. Click the section's gear icon to open its settings. | ||
| 1. Within the drawer that appears, select the **Data** or **Display preferences** tabs to configure the default settings for the section. For details about each **Data** setting, refer to the preceding section, [Individual line plot](#line-plot-settings). For details about each display preference, refer to [Configure section layout](../#configure-section-layout). | ||
| 1. Within the drawer that appears, select the **Data** or **Display preferences** tabs to configure the default settings for the section. For details about each **Data** setting, see the [Line plot reference](/models/app/features/panels/line-plot/reference). For details about each display preference, refer to [Configure section layout](../#configure-section-layout). | ||
|
|
||
| ### All line plots in a workspace | ||
| To customize the default settings for all line plots in a workspace: | ||
| 1. Click the workspace's settings, which has a gear with the label **Settings**. | ||
| 1. Click **Line plots**. | ||
| 1. Within the drawer that appears, select the **Data** or **Display preferences** tabs to configure the default settings for the workspace. | ||
| - For details about each **Data** setting, refer to the preceding section, [Individual line plot](#line-plot-settings). | ||
|
|
||
| - For details about each **Data** setting, see the [Line plot reference](/models/app/features/panels/line-plot/reference). | ||
| - For details about each **Display preferences** section, refer to [Workspace display preferences](../#configure-workspace-layout). At the workspace level, you can configure the default **Zooming** behavior for line plots. This setting controls whether to synchronize zooming across line plots with a matching x-axis key. Disabled by default. | ||
|
|
||
|
|
||
|
|
||
| ## Visualize average values on a plot | ||
|
|
||
| If you have several different experiments and you'd like to see the average of their values on a plot, you can use the Grouping feature in the table. Click "Group" above the run table and select "All" to show averaged values in your graphs. | ||
|
|
@@ -101,7 +78,7 @@ with wandb.init() as run: | |
| <img src="/images/app_ui/visualize_nan.png" alt="NaN value handling" /> | ||
| </Frame> | ||
|
|
||
| ## Compare two metrics on one chart | ||
| ## Compare multiple metrics on one chart | ||
|
|
||
| <Frame> | ||
| <img src="/images/app_ui/visualization_add.gif" alt="Adding visualization panels" /> | ||
|
|
@@ -149,13 +126,8 @@ If you'd like to see the absolute time that an experiment has taken, or see what | |
| <img src="/images/app_ui/howto_use_relative_time_or_wall_time.gif" alt="X-axis time options" /> | ||
| </Frame> | ||
|
|
||
| ## Area plots | ||
| To use a custom x-axis, log the metric in the same call to `wandb.Run.log()` where you log the y-axis. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add example code snippet showing this? or link to appropriate doc on custom x-axis? See https://docs.wandb.ai/models/track/log/customize-logging-axes#customize-log-axes |
||
|
|
||
| In the line plot settings, in the advanced tab, click on different plot styles to get an area plot or a percentage area plot. | ||
|
|
||
| <Frame> | ||
| <img src="/images/app_ui/line_plots_area_plots.gif" alt="Area plot styles" /> | ||
| </Frame> | ||
|
|
||
| ## Zoom | ||
|
|
||
|
|
@@ -182,62 +154,3 @@ From a line plot, you can quickly create a [run metrics notification](/models/au | |
| 1. Configure the automation using the basic or advanced configuration controls. For example, apply a run filter to limit the scope of the automation, or configure an absolute threshold. | ||
|
|
||
| Learn more about [Automations](/models/automations/). | ||
|
|
||
| ## Visualize CoreWeave infrastructure alerts | ||
|
|
||
| Observe infrastructure alerts such as GPU failures, thermal violations, and more during machine learning experiments you log to W&B. During a [W&B run](/models/runs/), [CoreWeave Mission Control](https://www.coreweave.com/mission-control) monitors your compute infrastructure. | ||
|
|
||
| <Note> | ||
| This feature is in Preview and only available when training on a CoreWeave cluster. Contact your W&B representative for access. | ||
| </Note> | ||
|
|
||
| If an error occurs, CoreWeave sends that information to W&B. W&B populates infrastructure information onto your run's plots in your project's workspace. CoreWeave attempts to automatically resolve some issues, and W&B surfaces that information in the run's page. | ||
|
|
||
| ### Find infrastructure issues in a run | ||
|
|
||
| W&B surfaces both SLURM job issues and cluster node issues. View infrastructure errors in a run: | ||
|
|
||
| 1. Navigate to your project on the W&B App. | ||
| 2. Select the **Workspace** tab to view your project's workspace. | ||
| 3. Search and select the name of the run that contains an infrastructure issue. If CoreWeave detected an infrastructure issue, one or more red vertical lines with an exclamation mark overlay the run's plots. | ||
| 4. Select an issue on a plot or select the **Issues** button in the top right of the page. A drawer appears that lists each issue reported by CoreWeave. | ||
|
|
||
| <Tip> | ||
| To views runs with infrastructure issues at a glance, pin the **Issues** column to your W&B Workspace to view runs that logged an issue at a glance. For more information about how to pin a column, see [Customize how runs are displayed](/models/runs/#customize-how-runs-are-displayed). | ||
| </Tip> | ||
|
|
||
| The **Overall Grafana view** at the top of the drawer redirects you to the SLURM job's Grafana dashboard, which contains system-level details about the run. The **Issues summary** describes the root error that the SLURM job reported to CoreWeave Mission Control. The summary section also describes any attempts to automatically resolve the error made by CoreWeave. | ||
|
|
||
| <Frame> | ||
| <img src="/images/app_ui/cw_wb_observability.png" /> | ||
| </Frame> | ||
|
|
||
| The **All Issues** list all issues that occurs during the run in chronological order, with the most recent issue at the top. The list contains the job issue and node issue alerts. Within each issue alert is the name of the issue, the timestamp when the issue occurred, a link to the Grafana dashboard for that issue, and a brief summary that describes the issue. | ||
|
|
||
| The following table shows example alerts for each category of infrastructure issues: | ||
|
|
||
| | Category | Example alerts | | ||
| | -------- | ------------- | | ||
| | Node Availability & Readiness | `KubeNodeNotReadyHGX`, `NodeExtendedDownTime` | | ||
| | GPU/Accelerator Errors | `GPUFallenOffBusHGX`, `GPUFaultHGX`, `NodeTooFewGPUs` | | ||
| | Hardware Errors | `HardwareErrorFatal`, `NodeRAIDMemberDegraded` | | ||
| | Networking & DNS | `NodeDNSFailureHGX`, `NodeEthFlappingLegacyNonGPU` | | ||
| | Power, Cooling, and Management | `NodeCPUHZThrottle`, `RedfishDown` | | ||
| | DPU & NVSwitch | `DPUNcoreVersionBelowDesired`, `NVSwitchFaultHGX` | | ||
| | Miscellaneous | `NodePCISpeedRootGBT`, `NodePCIWidthRootSMC` | | ||
|
|
||
| For detailed information on error types, see the [SLURM Job Metrics on the CoreWeave Docs](https://docs.coreweave.com/docs/observability/managed-grafana/sunk/slurm-job-metrics#job-info-alerts#job-info-alerts). | ||
|
|
||
| ### Debug infrastructure issues | ||
|
|
||
| Each run that you create in W&B corresponds to a single SLURM job in CoreWeave. You can view a failed job's [Grafana](https://grafana.com/) dashboard or discover more information about a single node. The link within the **Overview** section of the **Issues** drawer links to the SLURM job Grafana dashboard. Expand the **All Issues** dropdown to view both job and node issues and their respective Grafana dashboards. | ||
|
|
||
| <Note> | ||
| **Note** | ||
|
|
||
| The Grafana dashboard is only available for W&B users with a CoreWeave account. Contact W&B to configure Grafana with your W&B organization. | ||
| </Note> | ||
|
|
||
| Depending on the issue, you may need to adjust the SLURM job configuration, investigate the node's status, restart the job, or take other actions as needed. | ||
|
|
||
| For more information about CoreWeave SLURM jobs in Grafana, see Slurm/Job Metrics on the [CoreWeave Docs](https://docs.coreweave.com/docs/observability/managed-grafana/sunk/slurm-job-metrics#job-info-alerts). See [Job info: alerts](https://docs.coreweave.com/docs/observability/managed-grafana/sunk/slurm-job-metrics#job-info-alerts#job-info-alerts) for detailed information about job alerts. | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link
models/runs.#visualize-coreweave-infrastructure-alertsgave me a 404 on the preview site. is it because the extra.?