⚡️ Speed up function merge_detections by 10%
#627
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 10% (0.10x) speedup for
merge_detectionsininference/core/workflows/core_steps/fusion/detections_consensus/v1.py⏱️ Runtime :
2.15 milliseconds→1.96 milliseconds(best of20runs)📝 Explanation and details
The optimized code achieves a 9% speedup through several key micro-optimizations that reduce attribute lookups, memory allocations, and unnecessary conversions:
What optimizations were applied:
Eliminated dictionary lookup redundancy: Cached
detections.dataasddataand useddict.get()for scaling keys instead of checking membership withinfollowed by array access - this reduces multiple hash table lookups to single operations.Reduced function call overhead: Pre-cached aggregator functions (
class_selector,boxes_aggregator,masks_aggregator) instead of looking them up from dictionaries multiple times within conditionals.Optimized mask array creation: Replaced
np.array([aggregated_mask])withnp.expand_dims(aggregated_mask, axis=0)to avoid intermediate list allocation when creating the mask array.Streamlined confidence aggregation: Moved confidence aggregation outside the return statement and stored result in a variable to avoid nested function calls during object construction.
Improved dtype handling in aggregate_field_values: Added fast-path check for arrays already in floating-point format to skip unnecessary
astype(float)conversions, reducing array copying overhead.Why this leads to speedup:
.get()is faster thaninchecks followed by key accessTest case performance:
These optimizations are particularly effective for the large detection scenarios (500+ detections) in the test suite, where the cumulative effect of reduced lookups and allocations becomes significant. The optimizations maintain identical functionality while reducing computational overhead in the hot path.
✅ Correctness verification report:
⚙️ Existing Unit Tests and Runtime
workflows/unit_tests/core_steps/fusion/test_detections_consensus.py::test_merge_detections_no_maskworkflows/unit_tests/core_steps/fusion/test_detections_consensus.py::test_merge_detections_with_masksTo edit these changes
git checkout codeflash/optimize-merge_detections-mhbuv5coand push.