[Nexthop] Use run_tests script to run benchmarks #895

anna-nexthop · 2026-02-03T22:08:39Z

Pre-submission checklist

I've ran the linters locally and fixed lint errors related to the files I modified in this PR. You can install the linters by running pip install -r requirements-dev.txt && pre-commit install
pre-commit run

Summary

Adds support for running FBOSS benchmark test binaries as test suites using the existing run_test.py script. Benchmark tests measure performance metrics like throughput, latency, and speed of various FBOSS operations. Users can now easily run benchmark suites with automated CSV output containing detailed metrics for analysis.

Key Features

Benchmark Suite Management

Created benchmark suite configuration files in fboss/oss/hw_benchmark_tests/:
- t1_benchmarks.conf - T1 agent benchmark suite (9 benchmarks)
- t2_benchmarks.conf - T2 agent benchmark suite (13 benchmarks)
- additional_benchmarks.conf - Remaining benchmarks (15 benchmarks)
Configuration files packaged to ./share/hw_benchmark_tests/ following existing test config patterns

Performance Metrics Collection

Parses benchmark output to extract:
- Benchmark test name
- Relative time per iteration
- Iterations per second
- CPU time (microseconds)
- Maximum RSS memory usage
Generates timestamped CSV files with results for tracking and analysis
Three status values: OK (success), FAILED (incomplete output), TIMEOUT (exceeded 1200s limit)

Command-Line Interface

Added "benchmark" subcommand to run_test.py
Supports custom filter files via --filter_file argument
Filters out unavailable binaries automatically (e.g., DNX vs. XGS chips)
Provides helpful messages when no benchmark binaries are found

Implementation Details

BenchmarkTestRunner Architecture

Implemented as a standalone class (does not extend TestRunner) with:

Class-level constants: BENCHMARK_CONFIG_DIR, T1_BENCHMARKS_CONF, T2_BENCHMARKS_CONF, ALL_BENCHMARKS_CONF
Public methods:
- add_subcommand_arguments() - Register command-line arguments
- run_test(args) - Main entry point for running benchmarks
Private methods:
- _parse_benchmark_output() - Extract metrics from benchmark output
- _run_benchmark_binary() - Execute a single benchmark binary

Design rationale: Unlike other test runners that extend the abstract TestRunner base class, BenchmarkTestRunner is standalone because:

Benchmark tests are standalone binaries, not gtest-based tests
No warmboot/coldboot variants needed
Standard test filtering mechanisms not applicable
Simpler, more maintainable implementation

Packaging Updates

Updated package-fboss.py to copy benchmark configuration files to share/hw_benchmark_tests/
Benchmark binaries packaged to bin/ directory

Documentation Updates

Added Option 1: Using the run_test.py Script (Recommended) section with examples for running all benchmarks, T1 suite, T2 suite, and additional benchmarks
Kept existing manual execution as Option 2: Running Individual Binaries
Updated T1 Tests → Agent Benchmark Tests and T2 Tests → Agent Benchmark Tests sections to use run_test.py commands
Added note about CSV output with timestamped filenames

Testing

Unit Test Coverage

Added comprehensive unit tests using pytest (25 test cases total):

test_run_test.py - BenchmarkTestRunner coverage:

Loading from custom filter files
Handling nonexistent/empty files
Loading from default T1, T2, and additional configs
Handling missing default configs
Successful parsing with all metrics
Missing JSON metrics and benchmark lines
Empty output and different time units
Successful execution, timeout handling, execution failures
Exception handling
Execution with config arguments
List tests mode
Full execution with CSV writing
No existing binaries and some missing binaries
End-to-end workflow from default configs to list tests

test_benchmark_conf_files.py - Configuration validation:

Ensures no overlaps between benchmark suite lists

Test features:

All tests use mocking to avoid side effects
Tests verify both success and error cases
Temporary files properly cleaned up
Pythonic code style with list comprehensions

Integration: Tests integrated into CMake and run with ctest

Manual Verification

Verified benchmark subcommand, configuration files, and execution on device with T1 suite showing proper handling of successful benchmarks, failures, and timeouts with correct CSV output.

Visual Inspection of Documentation

Verified documentation formatting is correct by serving Docusaurus locally:

**Pre-submission checklist** - [x] I've ran the linters locally and fixed lint errors related to the files I modified in this PR. You can install the linters by running `pip install -r requirements-dev.txt && pre-commit install` - [x] `pre-commit run` Add support for running FBOSS benchmark test binaries as test suites using the existing `run_test.py` script. Benchmark tests are performance measurement binaries that measure metrics like throughput, latency, and speed of various FBOSS operations. Also includes documentation updates on how to run benchmarks via the `run_test.py` script Users can now easily run benchmark suites instead of individual binaries that outputs automated CSV output with detailed metrics for analysis. - Added `BenchmarkTestRunner` as a **standalone class** (does not extend `TestRunner`) for straightforward execution - Added "benchmark" subcommand to `run_test.py` - All benchmark-related configuration constants are scoped to the class (not global variables) for easier maintenance - Created benchmark suite configuration files in `fboss/oss/hw_benchmark_tests/`: - `t1_benchmarks.conf` - T1 agent benchmark suite (9 total) - `t2_benchmarks.conf` - T2 agent benchmark suite (13 total) - `additional_benchmarks.conf` - Remaining benchmarks (15 total) - Configuration files are packaged to `./share/hw_benchmark_tests/` following the same pattern as other test configs - Parses benchmark output to extract performance metrics: - Benchmark test name - Relative time per iteration - Iterations per second - CPU time (microseconds) - Maximum RSS memory usage - Generates timestamped CSV files with results for tracking and analysis - Three status values: - **OK**: Benchmark completed successfully with full metrics - **FAILED**: Benchmark failed or produced incomplete output - **TIMEOUT**: Benchmark exceeded timeout limit (default: 1200 seconds) - Updated `package.py` to include benchmark binaries in `agent-benchmarks` package - Updated `package-fboss.py` to copy benchmark configuration files to `share/hw_benchmark_tests/` - Added **Option 1: Using the run_test.py Script (Recommended)** section with examples for: - Running all benchmarks - Running T1 benchmark suite - Running T2 benchmark suite - Running additional benchmarks - Kept existing manual execution as **Option 2: Running Individual Binaries** with the complete list of benchmark binaries - Added note about CSV output with timestamped filenames - Updated **T1 Tests → Agent Benchmark Tests** section to use `run_test.py` command with reference to `t1_benchmarks.conf` - Updated **T2 Tests → Agent Benchmark Tests** section to use `run_test.py` command with reference to `t2_benchmarks.conf` - Follows the same pattern as other test types (SAI, QSFP, Link tests) for consistency - Included using the `file=...` reference for one place to maintain the list of benchmarks for both documentation and execution Verified documentation formatting is correct locally <img width="777" height="586" alt="image" src="https://github.com/user-attachments/assets/09e6230b-10db-4f80-9e2b-395886e906eb" /> <img width="1912" height="1270" alt="image" src="https://github.com/user-attachments/assets/bac98ba6-8654-42e8-9585-71dc3e4f2633" /> The `BenchmarkTestRunner` is implemented as a simple standalone class: - **Class-level constants**: `BENCHMARK_CONFIG_DIR`, `T1_BENCHMARKS_CONF`, `T2_BENCHMARKS_CONF`, `ALL_BENCHMARKS_CONF` - **Public methods**: - `add_subcommand_arguments()` - Register command-line arguments - `run_test(args)` - Main entry point for running benchmarks - **Private methods**: - `_parse_benchmark_output()` - Extract metrics from benchmark output - `_run_benchmark_binary()` - Execute a single benchmark binary Unlike other test runners that extend the abstract `TestRunner` base class, `BenchmarkTestRunner` is a standalone class. This design choice was made because: - Benchmark tests are standalone binaries, not gtest-based tests - They don't need warmboot/coldboot variants - They don't use the standard test filtering mechanisms - A simpler, more direct implementation is more maintainable Since not all benchmarks are expected to run on a device (e.g. DNX vs. XGS broadcom chips), the runner also filters out any binaries that may not be available. As well, since a vendor needs to explicitly enable building the benchmark binaries, if no benchmark binaries are found, then outputs a helpful message. ```bash python3 -m py_compile fboss/oss/scripts/run_scripts/run_test.py python3 -m py_compile fboss/oss/scripts/package.py python3 -m py_compile fboss/oss/scripts/package-fboss.py ``` Added two unit tests using the pytest module for coverage of expected behavior of the run_test script, initially started with comprehensive coverage of the benchmark subcommand, and ensuring there are no overlaps between the benchmark suite lists. These tests were integrated into CMake and will run with ctest Comprehensive coverage of BenchmarkTestRunner has 25 testcases covering: - Loading from custom filter file - Handling nonexistent files - Handling empty files - Loading from default T1, T2, and additional configs - Handling missing default configs - Handling all configs missing - Successful parsing with all metrics - Missing JSON metrics - Missing benchmark line - Empty output - Different time units - Successful execution - Timeout handling - Execution failure - Exception handling - Execution with config arguments - Nonexistent filter file handling - List tests mode - Full execution with CSV writing - No existing binaries - Some missing binaries - End-to-end workflow from default configs to list tests Implementation of the unit tests includes these key features: - All tests use mocking where appropriate to avoid side effects - Tests verify both success and error cases - Temporary files are properly cleaned up - Follows pythonic code style with list comprehensions where appropriate To run locally (in Docker environment): ```bash cd fboss/oss/scripts/run_scripts /usr/bin/python3 -m pytest test_run_test.py -v ``` Verified benchmark subcommand is available ```bash ./run_test.py benchmark --help Setting fboss environment variables usage: run_test.py benchmark [-h] [--filter_file FILTER_FILE] [--platform_mapping_override_path [PLATFORM_MAPPING_OVERRIDE_PATH]] optional arguments: -h, --help show this help message and exit --filter_file FILTER_FILE File containing list of benchmark binaries to run (one per line). --platform_mapping_override_path [PLATFORM_MAPPING_OVERRIDE_PATH] A file path to a platform mapping JSON file to be used. ``` Verified configuration files exist and are valid when loaded onto a device ```bash ls -la fboss/oss/hw_benchmark_tests/*.conf cat fboss/oss/hw_benchmark_tests/t1_benchmarks.conf cat fboss/oss/hw_benchmark_tests/t2_benchmarks.conf cat fboss/oss/hw_benchmark_tests/additional_benchmarks.conf ``` Run all benchmarks (default): ```bash ./bin/run_test.py benchmark ``` Run T1 benchmark suite: ```bash ./bin/run_test.py benchmark --filter_file ./share/hw_benchmark_tests/t1_benchmarks.conf ``` Run T2 benchmark suite: ```bash ./bin/run_test.py benchmark --filter_file ./share/hw_benchmark_tests/t2_benchmarks.conf ``` Run remaining additional benchmarks: ```bash ./bin/run_test.py benchmark --filter_file ./share/hw_benchmark_tests/additional_benchmarks.conf ``` - Configuration files will be packaged to `./share/hw_benchmark_tests/` when `package-fboss.py` runs - Benchmark binaries will be packaged to `bin/` directory Ran the t1 suite via run_test with a benchmark binary that's expected to hang on xgs chips: ```bash [root@fboss]# ./bin/run_test.py benchmark --filter_file ./share/hw_benchmark_tests/t1_benchmarks.conf Setting fboss environment variables Running benchmark tests... Running benchmarks from ./share/hw_benchmark_tests/t1_benchmarks.conf Total benchmarks to run: 9 Running command: sai_tx_slow_path_rate-sai_impl --fruid_filepath=/var/facebook/fboss/fruid.json --enable_sai_log WARN --logging DBG4 ... ================================================================================ BENCHMARK RESULTS SUMMARY ================================================================================ sai_tx_slow_path_rate-sai_impl: OK sai_rx_slow_path_rate-sai_impl: OK sai_ecmp_shrink_speed-sai_impl: OK sai_rib_resolution_speed-sai_impl: OK sai_ecmp_shrink_with_competing_route_updates_speed-sai_impl: OK sai_fsw_scale_route_add_speed-sai_impl: OK sai_stats_collection_speed-sai_impl: FAILED sai_init_and_exit_100Gx100G-sai_impl: OK sai_switch_reachability_change_speed-sai_impl: TIMEOUT ================================================================================ Total: 9 benchmarks OK: 7 Failed: 1 Timed Out: 1 ``` Verified the csv file has sane information: ```bash [root@fboss]# cat benchmark_results_20260115_191249.csv benchmark_binary_name,benchmark_test_name,test_status,relative_time_per_iter,iters_per_sec,cpu_time_usec,max_rss sai_tx_slow_path_rate-sai_impl,runTxSlowPathBenchmark,OK,52.12s,19.19m,126299302,1620652 sai_rx_slow_path_rate-sai_impl,RxSlowPathBenchmark,OK,32.53s,30.74m,42003121,1550232 sai_ecmp_shrink_speed-sai_impl,HwEcmpGroupShrink,OK,7.08s,141.30m,22483771,1607068 sai_rib_resolution_speed-sai_impl,RibResolutionBenchmark,OK,2.11s,474.92m,22031184,1826796 sai_ecmp_shrink_with_competing_route_updates_speed-sai_impl,HwEcmpGroupShrinkWithCompetingRouteUpdates,OK,7.23s,138.26m,23599807,1753464 sai_fsw_scale_route_add_speed-sai_impl,HwFswScaleRouteAddBenchmark,OK,1.35s,743.22m,24730127,1937892 sai_stats_collection_speed-sai_impl,,FAILED,,,, sai_init_and_exit_100Gx100G-sai_impl,HwInitAndExit100Gx100GBenchmark,OK,16.07s,62.22m,31929920,2162992 sai_switch_reachability_change_speed-sai_impl,,TIMEOUT,,,, ```

meta-cla bot added the CLA Signed label Feb 3, 2026

anna-nexthop force-pushed the anna-nexthop.benchmark-suite branch from 1c40350 to c37b7f8 Compare February 3, 2026 22:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Nexthop] Use run_tests script to run benchmarks #895

[Nexthop] Use run_tests script to run benchmarks #895

anna-nexthop commented Feb 3, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[Nexthop] Use run_tests script to run benchmarks #895

Are you sure you want to change the base?

[Nexthop] Use run_tests script to run benchmarks #895

Conversation

anna-nexthop commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Features

Benchmark Suite Management

Performance Metrics Collection

Command-Line Interface

Implementation Details

BenchmarkTestRunner Architecture

Packaging Updates

Documentation Updates

Testing

Unit Test Coverage

Manual Verification

Visual Inspection of Documentation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

anna-nexthop commented Feb 3, 2026 •

edited

Loading