Skip to content

Conversation

@aknysh
Copy link
Member

@aknysh aknysh commented Oct 1, 2025

what

  • Add performance tracking heatmap visualization to Atmos CLI with interactive TUI
  • Enable developers to identify performance bottlenecks using built-in instrumentation
  • Provide multiple visualization modes (bar chart, sparklines, table) with navigation support
  • Add comprehensive performance tracking across 150+ critical functions
  • Consolidate profiling documentation into unified guide covering both heatmap and pprof approaches

why

  • Makes performance analysis accessible to all developers without specialized profiling tools
  • Enables quick identification of slow operations during stack processing
  • Provides actionable insights for optimization efforts with P95 latency metrics
  • Reduces friction in performance debugging workflow
  • Offers flexible profiling approaches for different use cases (quick analysis vs deep profiling)

Performance Heatmap Feature

Atmos now includes built-in performance tracking that shows you exactly which operations are taking the longest time. This feature provides real-time visibility into function execution times with interactive visualization modes.

Quick Start

Run any Atmos command with the --heatmap flag:

atmos describe stacks --heatmap

Real Performance Analysis Example

Here's actual output from atmos describe stacks --heatmap:

Bar Chart View (Press 1 in interactive mode):

image

Performance Output

=== Atmos Performance Summary ===
Elapsed: 69.042791ms  Functions: 42  Calls: 5980
Function                                            Count      Total        Avg        Max      P95
exec.ProcessYAMLConfigFileWithContext                  52   18.305ms      352µs     3.61ms  3.495ms
exec.ValidateStacks                                     1    9.893ms    9.893ms    9.893ms  9.895ms
utils.processCustomTags                              1024    7.919ms        7µs      808µs     13µs
exec.FindStacksMap                                      2    7.038ms    3.519ms    6.463ms  6.463ms
exec.ProcessYAMLConfigFiles                             2    6.857ms    3.428ms    6.286ms  6.287ms
exec.Execute                                            1     6.43ms     6.43ms     6.43ms  6.431ms
merge.MergeWithOptions                                746    4.649ms        6µs      423µs     19µs
utils.GetHighlightedYAML                                1    4.171ms    4.171ms    4.171ms  4.171ms
utils.HighlightCodeWithConfig                           1    3.787ms    3.787ms    3.787ms  3.787ms
merge.MergeWithContext                                356    3.614ms       10µs      423µs     51µs
merge.MergeWithOptionsAndContext                      356    3.521ms        9µs      423µs     50µs
exec.ProcessImportSection                              52    3.346ms       64µs      872µs    784µs
utils.ConvertToYAML                                   177    3.287ms       18µs      660µs     71µs
exec.ProcessStackConfig                                12     1.69ms      140µs      265µs    192µs
merge.Merge                                           390    1.478ms        3µs       67µs     19µs
utils.GetGlobMatches                                   48    1.141ms       23µs      210µs    168µs
utils.JoinPaths                                         5     1.13ms      226µs    1.128ms  1.128ms
exec.getEmbeddedSchemaPath                              1      784µs      784µs      784µs    784µs
config.FindAllStackConfigsInPaths                       1      783µs      783µs      783µs    783µs
exec.ProcessYAMLConfigFile                              8      750µs       93µs      201µs    201µs
exec.ExecuteDescribeStacks                              1      682µs      682µs      682µs    682µs
exec.GetFileContent                                    52      446µs        8µs      161µs     25µs
utils.SliceContainsString                            2456      127µs          0        5µs        -
utils.PathMatch                                        32       57µs        1µs        6µs      1µs
utils.ResolveRelativePath                              32       43µs        1µs       36µs        -
exec.ProcessCommandLineArgs                             1       14µs       14µs       14µs     14µs
utils.EnsureDir                                         1       12µs       12µs       12µs     12µs
exec.BuildTerraformWorkspace                           10       10µs        1µs        9µs      9µs
exec.processSettingsIntegrationsGithub                 30       10µs          0        1µs        -
exec.createComponentStackMap                            2        7µs        3µs        7µs      7µs
utils.IsTemplateFile                                   52        5µs          0        1µs        -
exec.FindComponentsDerivedFromBaseComponents           10        3µs          0          0        -
utils.UniqueStrings                                    38        3µs          0          0        -
utils.getLexer                                          1        3µs        3µs        3µs      3µs
utils.GetHighlightSettings                              2        2µs        1µs        2µs      2µs
utils.IsDirectory                                       1        2µs        2µs        2µs      2µs
utils.JoinPath                                         11        2µs          0          0        -
config.processEnvVars                                   2        1µs          0          0        -

Visualization Modes

The interactive TUI supports three visualization modes:

  • Press 1: Bar Chart - Color gradient from red (slow) to green (fast) showing relative execution times
  • Press 2: Sparklines - Visual trend lines for each function
  • Press 3: Table View - Detailed metrics with Count/Total/Avg/Max/P95 (top 50 functions by total time)

Navigation & Controls

  • / or k/j: Navigate through rows (wraparound enabled)
  • 1-3: Switch visualization modes
  • q/esc: Exit and return to terminal

CLI Flags

All flag descriptions now match atmos --help output exactly:

  • --heatmap: Show performance heatmap visualization after command execution (includes P95 latency) (default: false)
  • --heatmap-mode: Heatmap visualization mode: bar, sparkline, table (press 1-3 to switch in TUI) (default: bar)

Comparison with Traditional Profiling

Feature Performance Heatmap pprof CPU Profiling
Setup Single flag Generate profile, run pprof
Visualization Interactive TUI Terminal (file mode) or web browser (server mode)
Analysis Post-execution Real-time (server mode) or post-execution (file mode)
Filtering Built-in top N Manual filtering
Distribution P95 latency included Requires processing
Use Case Quick analysis Deep profiling

Implementation Details

  • Added defer perf.Track() instrumentation to 150+ functions across critical paths
  • Implemented HDR histogram for accurate P95 latency calculations
  • Created Bubble Tea TUI with multiple visualization modes and vim-style navigation
  • Added snapshot filtering to prevent zero-time function display
  • Implemented visual display limits (top 25 for bar/sparkline, top 50 for table view)
  • Added package prefix naming convention for clear function identification
  • Consolidated profiling documentation into unified /docs/troubleshoot/profiling.mdx
  • Synced all CLI flag descriptions with help text for consistency

Testing

✅ All tests passing with coverage >80%
✅ Comprehensive test suite for heatmap TUI (17 test functions)
✅ GitHub utils tests (11 test functions)
✅ Enhanced pro.go tests (11 additional test functions)
✅ Linter checks passing (0 issues)
✅ Website builds successfully with no broken links
✅ Performance tracking verified with real Atmos workflows

Documentation

  • Comprehensive profiling guide: /docs/troubleshoot/profiling.mdx
    • Performance Heatmap section (quick analysis, interactive TUI)
    • pprof Profiling section (deep analysis, server/file modes)
    • Choosing the Right Tool comparison
    • Best practices and troubleshooting
  • Developer guidelines: Updated CLAUDE.md with mandatory performance tracking patterns
  • Real performance examples with screenshots and actual command output
  • Accurate feature descriptions: Removed misleading claims (sorting, unlimited rows)
  • Complete navigation documentation: All keyboard shortcuts documented

Key Documentation Updates

  1. ✅ Consolidated separate performance-heatmap.mdx into unified profiling.mdx
  2. ✅ Synced all CLI flag descriptions to exactly match atmos --help output
  3. ✅ Clarified Table Mode shows top 50 rows (not unlimited)
  4. ✅ Removed claim about sortable columns (not implemented)
  5. ✅ Added complete keyboard navigation documentation (↑/↓, k/j)
  6. ✅ Fixed broken link from /troubleshoot/logging to /troubleshoot/debugging

Summary by CodeRabbit

  • New Features

    • Added a post-run performance heatmap (bar/sparkline/table) with P95 metrics and interactive TUI when a TTY is available; selectable via new global flags --heatmap and --heatmap-mode.
  • Telemetry

    • Global performance tracking enabled across commands to collect timing/latency metrics for heatmap and diagnostics.
  • Documentation

    • Consolidated profiling documentation into comprehensive guide covering both heatmap (quick analysis) and pprof (deep profiling) approaches
    • Synced all CLI flag descriptions with help text
    • Added complete keyboard navigation documentation
    • Fixed broken links and removed misleading claims
  • Chores

    • Dependency updates (including HDR histogram library and various SDK bumps).
  • Tests

    • Extensive tests for heatmap rendering, CLI flags, and performance-tracking behaviors
    • Added comprehensive test coverage for terraform_generate_backends, github_utils, and pro features
    • All tests passing with >80% coverage

@aknysh aknysh requested a review from a team as a code owner October 1, 2025 20:17
@aknysh aknysh self-assigned this Oct 1, 2025
@github-actions github-actions bot added the size/xl Extra large size PR label Oct 1, 2025
@aknysh aknysh added minor New features that do not break anything and removed size/xl Extra large size PR labels Oct 1, 2025
@github-actions github-actions bot added the size/xl Extra large size PR label Oct 1, 2025
@mergify
Copy link

mergify bot commented Oct 1, 2025

Warning

This PR exceeds the recommended limit of 1,000 lines.

Large PRs are difficult to review and may be rejected due to their size.

Please verify that this PR does not address multiple issues.
Consider refactoring it into smaller, more focused PRs to facilitate a smoother review process.

coderabbitai[bot]
coderabbitai bot previously approved these changes Oct 2, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (5)
website/docs/troubleshoot/profiling.mdx (5)

96-99: Bold: Make top‑N limits explicit in bar/sparkline sections

Call out the top‑25 cap to set expectations.

Apply this diff:

 - Best for identifying relative performance differences
 - Color-coded gradient highlights hot functions
 - Easy visual comparison
+ - Shows top 25 functions by total time
@@
 - Compact view of many functions
 - Shows execution time distribution
 - Quick pattern recognition
+ - Shows top 25 functions by total time

Also applies to: 108-111


188-196: Bold: Safer baseline/optimized capture

Examples use 2> which assumes the summary is on stderr. Prefer capturing both streams for portability.

Apply this diff:

-# Baseline measurement
-atmos describe stacks --heatmap 2>baseline.txt
-
-# After optimization
-atmos describe stacks --heatmap 2>optimized.txt
+# Baseline measurement
+atmos describe stacks --heatmap 2>&1 | tee baseline.txt >/dev/null
+
+# After optimization
+atmos describe stacks --heatmap 2>&1 | tee optimized.txt >/dev/null

If the summary is guaranteed on stdout, switch to simple redirection (>) instead.


516-520: Bold: “Zero overhead” phrasing

With defer perf.Track() in many functions, disabled mode still incurs minimal call/defer overhead. Soften the claim.

Apply this diff:

-- **Enable only when needed**: Zero overhead when not using `--heatmap` flag
+- **Enable only when needed**: Negligible overhead when `--heatmap` is not used (tracking is disabled)

283-285: Bold: Cross‑platform browser opening

open is macOS‑specific. Consider adding xdg-open example for Linux.

Suggested addition after the block:

Linux: xdg-open http://localhost:6060/debug/pprof/
Windows: start http://localhost:6060/debug/pprof/

617-623: Bold: Remove bc dependency in CI example

bc may not be available in minimal runners. Use awk for numeric compare.

Example:

elapsed_ms=$(grep -m1 "Elapsed:" perf-output.txt | awk '{print $2}' | sed 's/ms//')
awk -v e="$elapsed_ms" 'BEGIN{exit !(e > 500)}' || {
  echo "Performance regression: ${elapsed_ms} > 500ms"
  atmos validate stacks --profile-file=cpu.prof --profile-type=cpu
  exit 1
}
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e4ef9b6 and c61f1f2.

📒 Files selected for processing (1)
  • website/docs/troubleshoot/profiling.mdx (10 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
website/**

📄 CodeRabbit inference engine (.cursor/rules/atmos-rules.mdc)

website/**: Update website documentation in website/ when adding features
Ensure consistency between CLI help text and website documentation
Follow the website's documentation structure and style
Keep website code in website/ and follow its architecture/style; test changes locally
Keep CLI and website documentation in sync; document new features with examples and use cases

Files:

  • website/docs/troubleshoot/profiling.mdx
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: Build (ubuntu-latest, linux)
  • GitHub Check: Build (windows-latest, windows)
  • GitHub Check: Analyze (go)
  • GitHub Check: Lint (golangci)
  • GitHub Check: Run pre-commit hooks
  • GitHub Check: website-deploy-preview
  • GitHub Check: Summary
🔇 Additional comments (2)
website/docs/troubleshoot/profiling.mdx (2)

1-6: Bold: Path/name consistency with PR description

PR summary references docs at /docs/troubleshoot/performance-heatmap, but this file is profiling.mdx. Please ensure nav, links, and any references are consistent (redirect or align paths).


560-563: Bold: Align non‑TTY fallback message with actual implementation

Verify the exact message text emitted by the CLI and mirror it verbatim to avoid confusion.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c61f1f2 and a60d915.

📒 Files selected for processing (1)
  • website/docs/troubleshoot/profiling.mdx (9 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
website/**

📄 CodeRabbit inference engine (.cursor/rules/atmos-rules.mdc)

website/**: Update website documentation in website/ when adding features
Ensure consistency between CLI help text and website documentation
Follow the website's documentation structure and style
Keep website code in website/ and follow its architecture/style; test changes locally
Keep CLI and website documentation in sync; document new features with examples and use cases

Files:

  • website/docs/troubleshoot/profiling.mdx
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: Build (ubuntu-latest, linux)
  • GitHub Check: Build (macos-latest, macos)
  • GitHub Check: Build (windows-latest, windows)
  • GitHub Check: website-deploy-preview
  • GitHub Check: Analyze (go)
  • GitHub Check: Lint (golangci)
  • GitHub Check: Run pre-commit hooks
  • GitHub Check: Summary

@aknysh aknysh added no-release Do not create a new release (wait for additional code changes) and removed minor New features that do not break anything labels Oct 2, 2025
@aknysh aknysh merged commit fc0f123 into main Oct 2, 2025
67 checks passed
@aknysh aknysh deleted the perf-2 branch October 2, 2025 13:23
@github-actions
Copy link

github-actions bot commented Oct 3, 2025

These changes were released in v1.193.0-rc.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

no-release Do not create a new release (wait for additional code changes) size/xl Extra large size PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants