-
Notifications
You must be signed in to change notification settings - Fork 377
[Nexthop] Use run_tests script to run benchmarks #895
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
anna-nexthop
wants to merge
1
commit into
facebook:main
Choose a base branch
from
nexthop-ai:anna-nexthop.benchmark-suite
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
[Nexthop] Use run_tests script to run benchmarks #895
anna-nexthop
wants to merge
1
commit into
facebook:main
from
nexthop-ai:anna-nexthop.benchmark-suite
+1,032
−28
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
**Pre-submission checklist**
- [x] I've ran the linters locally and fixed lint errors related to the
files I modified in this PR. You can install the linters by running `pip
install -r requirements-dev.txt && pre-commit install`
- [x] `pre-commit run`
Add support for running FBOSS benchmark test binaries as test suites
using the existing `run_test.py` script. Benchmark tests are performance
measurement binaries that measure metrics like throughput, latency, and
speed of various FBOSS operations.
Also includes documentation updates on how to run benchmarks via the
`run_test.py` script
Users can now easily run benchmark suites instead of individual
binaries that outputs automated CSV output with detailed metrics for
analysis.
- Added `BenchmarkTestRunner` as a **standalone class** (does not extend
`TestRunner`) for straightforward execution
- Added "benchmark" subcommand to `run_test.py`
- All benchmark-related configuration constants are scoped to the class
(not global variables) for easier maintenance
- Created benchmark suite configuration files in
`fboss/oss/hw_benchmark_tests/`:
- `t1_benchmarks.conf` - T1 agent benchmark suite (9 total)
- `t2_benchmarks.conf` - T2 agent benchmark suite (13 total)
- `additional_benchmarks.conf` - Remaining benchmarks (15 total)
- Configuration files are packaged to `./share/hw_benchmark_tests/`
following the same pattern as other test configs
- Parses benchmark output to extract performance metrics:
- Benchmark test name
- Relative time per iteration
- Iterations per second
- CPU time (microseconds)
- Maximum RSS memory usage
- Generates timestamped CSV files with results for tracking and analysis
- Three status values:
- **OK**: Benchmark completed successfully with full metrics
- **FAILED**: Benchmark failed or produced incomplete output
- **TIMEOUT**: Benchmark exceeded timeout limit (default: 1200 seconds)
- Updated `package.py` to include benchmark binaries in
`agent-benchmarks` package
- Updated `package-fboss.py` to copy benchmark configuration files to
`share/hw_benchmark_tests/`
- Added **Option 1: Using the run_test.py Script (Recommended)** section
with examples for:
- Running all benchmarks
- Running T1 benchmark suite
- Running T2 benchmark suite
- Running additional benchmarks
- Kept existing manual execution as **Option 2: Running Individual
Binaries** with the complete list of benchmark binaries
- Added note about CSV output with timestamped filenames
- Updated **T1 Tests → Agent Benchmark Tests** section to use
`run_test.py` command with reference to `t1_benchmarks.conf`
- Updated **T2 Tests → Agent Benchmark Tests** section to use
`run_test.py` command with reference to `t2_benchmarks.conf`
- Follows the same pattern as other test types (SAI, QSFP, Link tests)
for consistency
- Included using the `file=...` reference for one place to
maintain the list of benchmarks for both documentation and
execution
Verified documentation formatting is correct locally
<img width="777" height="586" alt="image"
src="https://github.com/user-attachments/assets/09e6230b-10db-4f80-9e2b-395886e906eb"
/>
<img width="1912" height="1270" alt="image"
src="https://github.com/user-attachments/assets/bac98ba6-8654-42e8-9585-71dc3e4f2633"
/>
The `BenchmarkTestRunner` is implemented as a simple standalone class:
- **Class-level constants**: `BENCHMARK_CONFIG_DIR`,
`T1_BENCHMARKS_CONF`, `T2_BENCHMARKS_CONF`, `ALL_BENCHMARKS_CONF`
- **Public methods**:
- `add_subcommand_arguments()` - Register command-line arguments
- `run_test(args)` - Main entry point for running benchmarks
- **Private methods**:
- `_parse_benchmark_output()` - Extract metrics from benchmark output
- `_run_benchmark_binary()` - Execute a single benchmark binary
Unlike other test runners that extend the abstract `TestRunner` base
class, `BenchmarkTestRunner` is a standalone class. This design choice
was made because:
- Benchmark tests are standalone binaries, not gtest-based tests
- They don't need warmboot/coldboot variants
- They don't use the standard test filtering mechanisms
- A simpler, more direct implementation is more maintainable
Since not all benchmarks are expected to run on a device (e.g. DNX
vs. XGS broadcom chips), the runner also filters out any binaries that
may not be available. As well, since a vendor needs to explicitly
enable building the benchmark binaries, if no benchmark binaries are
found, then outputs a helpful message.
```bash
python3 -m py_compile fboss/oss/scripts/run_scripts/run_test.py
python3 -m py_compile fboss/oss/scripts/package.py
python3 -m py_compile fboss/oss/scripts/package-fboss.py
```
Added two unit tests using the pytest module for coverage
of expected behavior of the run_test script, initially
started with comprehensive coverage of the benchmark subcommand,
and ensuring there are no overlaps between the benchmark suite lists.
These tests were integrated into CMake and will run with ctest
Comprehensive coverage of BenchmarkTestRunner has 25 testcases
covering:
- Loading from custom filter file
- Handling nonexistent files
- Handling empty files
- Loading from default T1, T2, and additional configs
- Handling missing default configs
- Handling all configs missing
- Successful parsing with all metrics
- Missing JSON metrics
- Missing benchmark line
- Empty output
- Different time units
- Successful execution
- Timeout handling
- Execution failure
- Exception handling
- Execution with config arguments
- Nonexistent filter file handling
- List tests mode
- Full execution with CSV writing
- No existing binaries
- Some missing binaries
- End-to-end workflow from default configs to list tests
Implementation of the unit tests includes these key features:
- All tests use mocking where appropriate to avoid side effects
- Tests verify both success and error cases
- Temporary files are properly cleaned up
- Follows pythonic code style with list comprehensions where appropriate
To run locally (in Docker environment):
```bash
cd fboss/oss/scripts/run_scripts
/usr/bin/python3 -m pytest test_run_test.py -v
```
Verified benchmark subcommand is available
```bash
./run_test.py benchmark --help
Setting fboss environment variables
usage: run_test.py benchmark [-h] [--filter_file FILTER_FILE] [--platform_mapping_override_path [PLATFORM_MAPPING_OVERRIDE_PATH]]
optional arguments:
-h, --help show this help message and exit
--filter_file FILTER_FILE
File containing list of benchmark binaries to run (one per line).
--platform_mapping_override_path [PLATFORM_MAPPING_OVERRIDE_PATH]
A file path to a platform mapping JSON file to be used.
```
Verified configuration files exist and are valid when loaded onto a device
```bash
ls -la fboss/oss/hw_benchmark_tests/*.conf
cat fboss/oss/hw_benchmark_tests/t1_benchmarks.conf
cat fboss/oss/hw_benchmark_tests/t2_benchmarks.conf
cat fboss/oss/hw_benchmark_tests/additional_benchmarks.conf
```
Run all benchmarks (default):
```bash
./bin/run_test.py benchmark
```
Run T1 benchmark suite:
```bash
./bin/run_test.py benchmark --filter_file ./share/hw_benchmark_tests/t1_benchmarks.conf
```
Run T2 benchmark suite:
```bash
./bin/run_test.py benchmark --filter_file ./share/hw_benchmark_tests/t2_benchmarks.conf
```
Run remaining additional benchmarks:
```bash
./bin/run_test.py benchmark --filter_file ./share/hw_benchmark_tests/additional_benchmarks.conf
```
- Configuration files will be packaged to `./share/hw_benchmark_tests/`
when `package-fboss.py` runs
- Benchmark binaries will be packaged to `bin/` directory
Ran the t1 suite via run_test with a benchmark binary that's expected to
hang on xgs chips:
```bash
[root@fboss]# ./bin/run_test.py benchmark --filter_file ./share/hw_benchmark_tests/t1_benchmarks.conf
Setting fboss environment variables
Running benchmark tests...
Running benchmarks from ./share/hw_benchmark_tests/t1_benchmarks.conf
Total benchmarks to run: 9
Running command: sai_tx_slow_path_rate-sai_impl --fruid_filepath=/var/facebook/fboss/fruid.json --enable_sai_log WARN --logging DBG4
...
================================================================================
BENCHMARK RESULTS SUMMARY
================================================================================
sai_tx_slow_path_rate-sai_impl: OK
sai_rx_slow_path_rate-sai_impl: OK
sai_ecmp_shrink_speed-sai_impl: OK
sai_rib_resolution_speed-sai_impl: OK
sai_ecmp_shrink_with_competing_route_updates_speed-sai_impl: OK
sai_fsw_scale_route_add_speed-sai_impl: OK
sai_stats_collection_speed-sai_impl: FAILED
sai_init_and_exit_100Gx100G-sai_impl: OK
sai_switch_reachability_change_speed-sai_impl: TIMEOUT
================================================================================
Total: 9 benchmarks
OK: 7
Failed: 1
Timed Out: 1
```
Verified the csv file has sane information:
```bash
[root@fboss]# cat benchmark_results_20260115_191249.csv
benchmark_binary_name,benchmark_test_name,test_status,relative_time_per_iter,iters_per_sec,cpu_time_usec,max_rss
sai_tx_slow_path_rate-sai_impl,runTxSlowPathBenchmark,OK,52.12s,19.19m,126299302,1620652
sai_rx_slow_path_rate-sai_impl,RxSlowPathBenchmark,OK,32.53s,30.74m,42003121,1550232
sai_ecmp_shrink_speed-sai_impl,HwEcmpGroupShrink,OK,7.08s,141.30m,22483771,1607068
sai_rib_resolution_speed-sai_impl,RibResolutionBenchmark,OK,2.11s,474.92m,22031184,1826796
sai_ecmp_shrink_with_competing_route_updates_speed-sai_impl,HwEcmpGroupShrinkWithCompetingRouteUpdates,OK,7.23s,138.26m,23599807,1753464
sai_fsw_scale_route_add_speed-sai_impl,HwFswScaleRouteAddBenchmark,OK,1.35s,743.22m,24730127,1937892
sai_stats_collection_speed-sai_impl,,FAILED,,,,
sai_init_and_exit_100Gx100G-sai_impl,HwInitAndExit100Gx100GBenchmark,OK,16.07s,62.22m,31929920,2162992
sai_switch_reachability_change_speed-sai_impl,,TIMEOUT,,,,
```
1c40350 to
c37b7f8
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Pre-submission checklist
pip install -r requirements-dev.txt && pre-commit installpre-commit runSummary
Adds support for running FBOSS benchmark test binaries as test suites using the existing
run_test.pyscript. Benchmark tests measure performance metrics like throughput, latency, and speed of various FBOSS operations. Users can now easily run benchmark suites with automated CSV output containing detailed metrics for analysis.Key Features
Benchmark Suite Management
fboss/oss/hw_benchmark_tests/:t1_benchmarks.conf- T1 agent benchmark suite (9 benchmarks)t2_benchmarks.conf- T2 agent benchmark suite (13 benchmarks)additional_benchmarks.conf- Remaining benchmarks (15 benchmarks)./share/hw_benchmark_tests/following existing test config patternsPerformance Metrics Collection
Command-Line Interface
run_test.py--filter_fileargumentImplementation Details
BenchmarkTestRunner Architecture
Implemented as a standalone class (does not extend
TestRunner) with:BENCHMARK_CONFIG_DIR,T1_BENCHMARKS_CONF,T2_BENCHMARKS_CONF,ALL_BENCHMARKS_CONFadd_subcommand_arguments()- Register command-line argumentsrun_test(args)- Main entry point for running benchmarks_parse_benchmark_output()- Extract metrics from benchmark output_run_benchmark_binary()- Execute a single benchmark binaryDesign rationale: Unlike other test runners that extend the abstract
TestRunnerbase class,BenchmarkTestRunneris standalone because:Packaging Updates
package-fboss.pyto copy benchmark configuration files toshare/hw_benchmark_tests/bin/directoryDocumentation Updates
run_test.pycommandsTesting
Unit Test Coverage
Added comprehensive unit tests using pytest (25 test cases total):
test_run_test.py - BenchmarkTestRunner coverage:
test_benchmark_conf_files.py - Configuration validation:
Test features:
Integration: Tests integrated into CMake and run with ctest
Manual Verification
Verified benchmark subcommand, configuration files, and execution on device with T1 suite showing proper handling of successful benchmarks, failures, and timeouts with correct CSV output.
Visual Inspection of Documentation
Verified documentation formatting is correct by serving Docusaurus locally:
