Skip to content

Conversation

@anna-nexthop
Copy link
Contributor

@anna-nexthop anna-nexthop commented Feb 3, 2026

Pre-submission checklist

  • I've ran the linters locally and fixed lint errors related to the files I modified in this PR. You can install the linters by running pip install -r requirements-dev.txt && pre-commit install
  • pre-commit run

Summary

Adds support for running FBOSS benchmark test binaries as test suites using the existing run_test.py script. Benchmark tests measure performance metrics like throughput, latency, and speed of various FBOSS operations. Users can now easily run benchmark suites with automated CSV output containing detailed metrics for analysis.

Key Features

Benchmark Suite Management

  • Created benchmark suite configuration files in fboss/oss/hw_benchmark_tests/:
    • t1_benchmarks.conf - T1 agent benchmark suite (9 benchmarks)
    • t2_benchmarks.conf - T2 agent benchmark suite (13 benchmarks)
    • additional_benchmarks.conf - Remaining benchmarks (15 benchmarks)
  • Configuration files packaged to ./share/hw_benchmark_tests/ following existing test config patterns

Performance Metrics Collection

  • Parses benchmark output to extract:
    • Benchmark test name
    • Relative time per iteration
    • Iterations per second
    • CPU time (microseconds)
    • Maximum RSS memory usage
  • Generates timestamped CSV files with results for tracking and analysis
  • Three status values: OK (success), FAILED (incomplete output), TIMEOUT (exceeded 1200s limit)

Command-Line Interface

  • Added "benchmark" subcommand to run_test.py
  • Supports custom filter files via --filter_file argument
  • Filters out unavailable binaries automatically (e.g., DNX vs. XGS chips)
  • Provides helpful messages when no benchmark binaries are found

Implementation Details

BenchmarkTestRunner Architecture

Implemented as a standalone class (does not extend TestRunner) with:

  • Class-level constants: BENCHMARK_CONFIG_DIR, T1_BENCHMARKS_CONF, T2_BENCHMARKS_CONF, ALL_BENCHMARKS_CONF
  • Public methods:
    • add_subcommand_arguments() - Register command-line arguments
    • run_test(args) - Main entry point for running benchmarks
  • Private methods:
    • _parse_benchmark_output() - Extract metrics from benchmark output
    • _run_benchmark_binary() - Execute a single benchmark binary

Design rationale: Unlike other test runners that extend the abstract TestRunner base class, BenchmarkTestRunner is standalone because:

  • Benchmark tests are standalone binaries, not gtest-based tests
  • No warmboot/coldboot variants needed
  • Standard test filtering mechanisms not applicable
  • Simpler, more maintainable implementation

Packaging Updates

  • Updated package-fboss.py to copy benchmark configuration files to share/hw_benchmark_tests/
  • Benchmark binaries packaged to bin/ directory

Documentation Updates

  • Added Option 1: Using the run_test.py Script (Recommended) section with examples for running all benchmarks, T1 suite, T2 suite, and additional benchmarks
  • Kept existing manual execution as Option 2: Running Individual Binaries
  • Updated T1 Tests → Agent Benchmark Tests and T2 Tests → Agent Benchmark Tests sections to use run_test.py commands
  • Added note about CSV output with timestamped filenames

Testing

Unit Test Coverage

Added comprehensive unit tests using pytest (25 test cases total):

test_run_test.py - BenchmarkTestRunner coverage:

  • Loading from custom filter files
  • Handling nonexistent/empty files
  • Loading from default T1, T2, and additional configs
  • Handling missing default configs
  • Successful parsing with all metrics
  • Missing JSON metrics and benchmark lines
  • Empty output and different time units
  • Successful execution, timeout handling, execution failures
  • Exception handling
  • Execution with config arguments
  • List tests mode
  • Full execution with CSV writing
  • No existing binaries and some missing binaries
  • End-to-end workflow from default configs to list tests

test_benchmark_conf_files.py - Configuration validation:

  • Ensures no overlaps between benchmark suite lists

Test features:

  • All tests use mocking to avoid side effects
  • Tests verify both success and error cases
  • Temporary files properly cleaned up
  • Pythonic code style with list comprehensions

Integration: Tests integrated into CMake and run with ctest

Manual Verification

Verified benchmark subcommand, configuration files, and execution on device with T1 suite showing proper handling of successful benchmarks, failures, and timeouts with correct CSV output.

Visual Inspection of Documentation

Verified documentation formatting is correct by serving Docusaurus locally:
image

image

@meta-cla meta-cla bot added the CLA Signed label Feb 3, 2026
**Pre-submission checklist**
- [x] I've ran the linters locally and fixed lint errors related to the
files I modified in this PR. You can install the linters by running `pip
install -r requirements-dev.txt && pre-commit install`
- [x] `pre-commit run`

Add support for running FBOSS benchmark test binaries as test suites
using the existing `run_test.py` script. Benchmark tests are performance
measurement binaries that measure metrics like throughput, latency, and
speed of various FBOSS operations.

Also includes documentation updates on how to run benchmarks via the
`run_test.py` script

Users can now easily run benchmark suites instead of individual
binaries that outputs automated CSV output with detailed metrics for
analysis.

- Added `BenchmarkTestRunner` as a **standalone class** (does not extend
`TestRunner`) for straightforward execution
- Added "benchmark" subcommand to `run_test.py`
- All benchmark-related configuration constants are scoped to the class
(not global variables) for easier maintenance

- Created benchmark suite configuration files in
`fboss/oss/hw_benchmark_tests/`:
  - `t1_benchmarks.conf` - T1 agent benchmark suite (9 total)
  - `t2_benchmarks.conf` - T2 agent benchmark suite (13 total)
  - `additional_benchmarks.conf` - Remaining benchmarks (15 total)
- Configuration files are packaged to `./share/hw_benchmark_tests/`
following the same pattern as other test configs

- Parses benchmark output to extract performance metrics:
  - Benchmark test name
  - Relative time per iteration
  - Iterations per second
  - CPU time (microseconds)
  - Maximum RSS memory usage
- Generates timestamped CSV files with results for tracking and analysis
- Three status values:
  - **OK**: Benchmark completed successfully with full metrics
  - **FAILED**: Benchmark failed or produced incomplete output
  - **TIMEOUT**: Benchmark exceeded timeout limit (default: 1200 seconds)

- Updated `package.py` to include benchmark binaries in
`agent-benchmarks` package
- Updated `package-fboss.py` to copy benchmark configuration files to
`share/hw_benchmark_tests/`

- Added **Option 1: Using the run_test.py Script (Recommended)** section
with examples for:
  - Running all benchmarks
  - Running T1 benchmark suite
  - Running T2 benchmark suite
  - Running additional benchmarks
- Kept existing manual execution as **Option 2: Running Individual
Binaries** with the complete list of benchmark binaries
- Added note about CSV output with timestamped filenames

- Updated **T1 Tests → Agent Benchmark Tests** section to use
`run_test.py` command with reference to `t1_benchmarks.conf`
- Updated **T2 Tests → Agent Benchmark Tests** section to use
`run_test.py` command with reference to `t2_benchmarks.conf`
- Follows the same pattern as other test types (SAI, QSFP, Link tests)
for consistency
    - Included using the `file=...` reference for one place to
    maintain the list of benchmarks for both documentation and
    execution

Verified documentation formatting is correct locally
<img width="777" height="586" alt="image"
src="https://github.com/user-attachments/assets/09e6230b-10db-4f80-9e2b-395886e906eb"
/>

<img width="1912" height="1270" alt="image"
src="https://github.com/user-attachments/assets/bac98ba6-8654-42e8-9585-71dc3e4f2633"
/>

The `BenchmarkTestRunner` is implemented as a simple standalone class:
- **Class-level constants**: `BENCHMARK_CONFIG_DIR`,
`T1_BENCHMARKS_CONF`, `T2_BENCHMARKS_CONF`, `ALL_BENCHMARKS_CONF`
- **Public methods**:
  - `add_subcommand_arguments()` - Register command-line arguments
  - `run_test(args)` - Main entry point for running benchmarks
- **Private methods**:
  - `_parse_benchmark_output()` - Extract metrics from benchmark output
  - `_run_benchmark_binary()` - Execute a single benchmark binary

Unlike other test runners that extend the abstract `TestRunner` base
class, `BenchmarkTestRunner` is a standalone class.  This design choice
was made because:
- Benchmark tests are standalone binaries, not gtest-based tests
- They don't need warmboot/coldboot variants
- They don't use the standard test filtering mechanisms
- A simpler, more direct implementation is more maintainable

Since not all benchmarks are expected to run on a device (e.g. DNX
vs. XGS broadcom chips), the runner also filters out any binaries that
may not be available. As well, since a vendor needs to explicitly
enable building the benchmark binaries, if no benchmark binaries are
found, then outputs a helpful message.

```bash
python3 -m py_compile fboss/oss/scripts/run_scripts/run_test.py
python3 -m py_compile fboss/oss/scripts/package.py
python3 -m py_compile fboss/oss/scripts/package-fboss.py
```

Added two unit tests using the pytest module for coverage
of expected behavior of the run_test script, initially
started with comprehensive coverage of the benchmark subcommand,
and ensuring there are no overlaps between the benchmark suite lists.
These tests were integrated into CMake and will run with ctest

Comprehensive coverage of BenchmarkTestRunner has 25 testcases
covering:
- Loading from custom filter file
- Handling nonexistent files
- Handling empty files
- Loading from default T1, T2, and additional configs
- Handling missing default configs
- Handling all configs missing
- Successful parsing with all metrics
- Missing JSON metrics
- Missing benchmark line
- Empty output
- Different time units
- Successful execution
- Timeout handling
- Execution failure
- Exception handling
- Execution with config arguments
- Nonexistent filter file handling
- List tests mode
- Full execution with CSV writing
- No existing binaries
- Some missing binaries
- End-to-end workflow from default configs to list tests

Implementation of the unit tests includes these key features:
- All tests use mocking where appropriate to avoid side effects
- Tests verify both success and error cases
- Temporary files are properly cleaned up
- Follows pythonic code style with list comprehensions where appropriate

To run locally (in Docker environment):
```bash
cd fboss/oss/scripts/run_scripts
/usr/bin/python3 -m pytest test_run_test.py -v
```

Verified benchmark subcommand is available
```bash
./run_test.py benchmark --help

Setting fboss environment variables
usage: run_test.py benchmark [-h] [--filter_file FILTER_FILE] [--platform_mapping_override_path [PLATFORM_MAPPING_OVERRIDE_PATH]]

optional arguments:
  -h, --help            show this help message and exit
  --filter_file FILTER_FILE
                        File containing list of benchmark binaries to run (one per line).
  --platform_mapping_override_path [PLATFORM_MAPPING_OVERRIDE_PATH]
                        A file path to a platform mapping JSON file to be used.
```

Verified configuration files exist and are valid when loaded onto a device
```bash
ls -la fboss/oss/hw_benchmark_tests/*.conf
cat fboss/oss/hw_benchmark_tests/t1_benchmarks.conf
cat fboss/oss/hw_benchmark_tests/t2_benchmarks.conf
cat fboss/oss/hw_benchmark_tests/additional_benchmarks.conf
```

Run all benchmarks (default):
```bash
./bin/run_test.py benchmark
```

Run T1 benchmark suite:
```bash
./bin/run_test.py benchmark --filter_file ./share/hw_benchmark_tests/t1_benchmarks.conf
```

Run T2 benchmark suite:
```bash
./bin/run_test.py benchmark --filter_file ./share/hw_benchmark_tests/t2_benchmarks.conf
```

Run remaining additional benchmarks:
```bash
./bin/run_test.py benchmark --filter_file ./share/hw_benchmark_tests/additional_benchmarks.conf
```

- Configuration files will be packaged to `./share/hw_benchmark_tests/`
when `package-fboss.py` runs
- Benchmark binaries will be packaged to `bin/` directory

Ran the t1 suite via run_test with a benchmark binary that's expected to
hang on xgs chips:
```bash
[root@fboss]# ./bin/run_test.py benchmark --filter_file ./share/hw_benchmark_tests/t1_benchmarks.conf
Setting fboss environment variables
Running benchmark tests...
Running benchmarks from ./share/hw_benchmark_tests/t1_benchmarks.conf
Total benchmarks to run: 9
Running command: sai_tx_slow_path_rate-sai_impl --fruid_filepath=/var/facebook/fboss/fruid.json --enable_sai_log WARN --logging DBG4
...

================================================================================
BENCHMARK RESULTS SUMMARY
================================================================================
sai_tx_slow_path_rate-sai_impl: OK
sai_rx_slow_path_rate-sai_impl: OK
sai_ecmp_shrink_speed-sai_impl: OK
sai_rib_resolution_speed-sai_impl: OK
sai_ecmp_shrink_with_competing_route_updates_speed-sai_impl: OK
sai_fsw_scale_route_add_speed-sai_impl: OK
sai_stats_collection_speed-sai_impl: FAILED
sai_init_and_exit_100Gx100G-sai_impl: OK
sai_switch_reachability_change_speed-sai_impl: TIMEOUT
================================================================================

Total: 9 benchmarks
OK: 7
Failed: 1
Timed Out: 1
```

 Verified the csv file has sane information:
```bash
[root@fboss]# cat benchmark_results_20260115_191249.csv
benchmark_binary_name,benchmark_test_name,test_status,relative_time_per_iter,iters_per_sec,cpu_time_usec,max_rss
sai_tx_slow_path_rate-sai_impl,runTxSlowPathBenchmark,OK,52.12s,19.19m,126299302,1620652
sai_rx_slow_path_rate-sai_impl,RxSlowPathBenchmark,OK,32.53s,30.74m,42003121,1550232
sai_ecmp_shrink_speed-sai_impl,HwEcmpGroupShrink,OK,7.08s,141.30m,22483771,1607068
sai_rib_resolution_speed-sai_impl,RibResolutionBenchmark,OK,2.11s,474.92m,22031184,1826796
sai_ecmp_shrink_with_competing_route_updates_speed-sai_impl,HwEcmpGroupShrinkWithCompetingRouteUpdates,OK,7.23s,138.26m,23599807,1753464
sai_fsw_scale_route_add_speed-sai_impl,HwFswScaleRouteAddBenchmark,OK,1.35s,743.22m,24730127,1937892
sai_stats_collection_speed-sai_impl,,FAILED,,,,
sai_init_and_exit_100Gx100G-sai_impl,HwInitAndExit100Gx100GBenchmark,OK,16.07s,62.22m,31929920,2162992
sai_switch_reachability_change_speed-sai_impl,,TIMEOUT,,,,
```
@anna-nexthop anna-nexthop force-pushed the anna-nexthop.benchmark-suite branch from 1c40350 to c37b7f8 Compare February 3, 2026 22:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant