Problem
In src/inference_endpoint/config/runtime_settings.py:142-148, target_qps falls back to a hardcoded 10.0 for Offline (max_throughput) mode instead of None:
# TODO: target_qps should be None in Offline mode but using 10.0 as fallback
# to avoid breaking changes
target_qps = config.settings.target_qps or 10.0
In Offline/max_throughput mode, target_qps is semantically irrelevant — all queries are issued at t=0 as a burst. Having it default to 10.0 is misleading and can affect any downstream logic that reads this field.
Expected Behavior
target_qps should be None when the load pattern is max_throughput. The fallback workaround should be removed and callers that depend on this field should be updated to handle None.
Files to Modify
src/inference_endpoint/config/runtime_settings.py
- Any callers that read
runtime_settings.target_qps without a None check
Problem
In
src/inference_endpoint/config/runtime_settings.py:142-148,target_qpsfalls back to a hardcoded10.0for Offline (max_throughput) mode instead ofNone:In Offline/max_throughput mode,
target_qpsis semantically irrelevant — all queries are issued at t=0 as a burst. Having it default to10.0is misleading and can affect any downstream logic that reads this field.Expected Behavior
target_qpsshould beNonewhen the load pattern ismax_throughput. The fallback workaround should be removed and callers that depend on this field should be updated to handleNone.Files to Modify
src/inference_endpoint/config/runtime_settings.pyruntime_settings.target_qpswithout a None check