Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 29, 2025

📄 14% (0.14x) speedup for std in pandas/core/array_algos/masked_reductions.py

⏱️ Runtime : 3.96 milliseconds 3.46 milliseconds (best of 172 runs)

📝 Explanation and details

The optimized code achieves a 14% speedup through three key optimizations that reduce unnecessary computations when masks have no missing values:

1. Early mask.any() checks in _reductions()

  • What: Added mask.any() checks before expensive operations to avoid work when all values are valid
  • Why faster: For arrays with no missing values (common case), this avoids array indexing (values[~mask]) and creating inverted masks (~mask) in the where= parameter
  • Best for: Test cases with no missing values show 136-187% speedups (e.g., test_std_basic_no_missing, test_std_large_array_no_missing)

2. Fast path in std() function

  • What: Added if not mask.any(): return np.std(values, axis=axis, ddof=ddof) before expensive warning handling
  • Why faster: Bypasses the entire warnings.catch_warnings() context manager and _reductions() call when no masking is needed
  • Best for: Simple arrays with no missing values get direct numpy computation, showing 52-161% improvements

3. Object dtype optimization

  • What: In object dtype handling, check mask.any() before array slicing to avoid creating new arrays when unnecessary
  • Why faster: Prevents allocation of values[~mask] when all values are valid, using original array directly
  • Best for: Object dtype arrays without missing values show 12-52% improvements

The optimizations are most effective for arrays with no or few missing values, which are common in real-world data. Arrays with many missing values show smaller improvements (or slight regressions due to additional checks), but the overall performance gain comes from optimizing the common case of clean data.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 86 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import numpy as np
# imports
import pytest  # used for our unit tests
from pandas.core.array_algos.masked_reductions import std


# function to test (copied from user prompt, with minimal changes for standalone testability)
class NAType:
    """Dummy NA singleton for missing values."""
    def __eq__(self, other):
        return isinstance(other, NAType)
    def __repr__(self):
        return "NA"

NA = NAType()
from pandas.core.array_algos.masked_reductions import std

# =========================
# Unit tests for std
# =========================

# ----------- 1. Basic Test Cases ------------

def test_std_basic_no_missing():
    # Basic: no missing values, 1D array
    arr = np.array([1, 2, 3, 4, 5])
    mask = np.array([False]*5)
    codeflash_output = np.std(arr, ddof=1); expected = codeflash_output # 40.7μs -> 43.3μs (6.09% slower)
    codeflash_output = std(arr, mask); result = codeflash_output # 49.1μs -> 19.5μs (152% faster)

def test_std_basic_with_missing_skipna():
    # Basic: missing values, skipna=True
    arr = np.array([1, 2, 3, 4, 5])
    mask = np.array([False, True, False, False, False])
    # Only values [1,3,4,5] used
    codeflash_output = np.std(np.array([1,3,4,5]), ddof=1); expected = codeflash_output # 32.4μs -> 31.2μs (3.69% faster)
    codeflash_output = std(arr, mask, skipna=True); result = codeflash_output # 44.7μs -> 51.6μs (13.5% slower)

def test_std_basic_with_missing_no_skipna():
    # Basic: missing values, skipna=False
    arr = np.array([1, 2, 3, 4, 5])
    mask = np.array([False, True, False, False, False])
    codeflash_output = std(arr, mask, skipna=False); result = codeflash_output # 16.5μs -> 23.1μs (28.9% slower)

def test_std_basic_ddof_0():
    # Basic: ddof=0 (population std)
    arr = np.array([1, 2, 3, 4, 5])
    mask = np.array([False]*5)
    codeflash_output = np.std(arr, ddof=0); expected = codeflash_output # 35.4μs -> 34.8μs (1.66% faster)
    codeflash_output = std(arr, mask, ddof=0); result = codeflash_output # 44.9μs -> 17.2μs (160% faster)

def test_std_basic_2d_axis0():
    # Basic: 2D array, axis=0
    arr = np.array([[1,2,3],[4,5,6]])
    mask = np.array([[False,False,False],[False,False,False]])
    codeflash_output = np.std(arr, axis=0, ddof=1); expected = codeflash_output # 32.1μs -> 32.2μs (0.348% slower)
    codeflash_output = std(arr, mask, axis=0); result = codeflash_output # 51.8μs -> 19.1μs (171% faster)

def test_std_basic_2d_axis1_with_missing():
    # Basic: 2D array, axis=1, with missing
    arr = np.array([[1,2,3],[4,5,6]])
    mask = np.array([[False,True,False],[False,False,True]])
    # First row: [1,3], second row: [4,5]
    expected = np.array([np.std([1,3], ddof=1), np.std([4,5], ddof=1)]) # 33.5μs -> 33.9μs (1.07% slower)
    codeflash_output = std(arr, mask, axis=1); result = codeflash_output # 51.6μs -> 60.5μs (14.8% slower)

# ----------- 2. Edge Test Cases ------------

def test_std_empty_array():
    # Edge: empty array
    arr = np.array([])
    mask = np.array([], dtype=bool)
    codeflash_output = std(arr, mask); result = codeflash_output # 1.06μs -> 965ns (9.43% faster)

def test_std_all_missing():
    # Edge: all values missing
    arr = np.array([1,2,3])
    mask = np.array([True,True,True])
    codeflash_output = std(arr, mask); result = codeflash_output # 8.71μs -> 9.11μs (4.38% slower)

def test_std_one_value_no_missing_ddof0():
    # Edge: single value, ddof=0
    arr = np.array([42])
    mask = np.array([False])
    codeflash_output = np.std(arr, ddof=0); expected = codeflash_output # 39.5μs -> 40.8μs (3.32% slower)
    codeflash_output = std(arr, mask, ddof=0); result = codeflash_output # 48.4μs -> 19.2μs (152% faster)

def test_std_one_value_no_missing_ddof1():
    # Edge: single value, ddof=1 (should be nan)
    arr = np.array([42])
    mask = np.array([False])
    codeflash_output = np.std(arr, ddof=1); expected = codeflash_output # 54.6μs -> 53.8μs (1.50% faster)
    codeflash_output = std(arr, mask, ddof=1); result = codeflash_output # 48.3μs -> 20.5μs (136% faster)

def test_std_one_value_missing():
    # Edge: single value, missing
    arr = np.array([42])
    mask = np.array([True])
    codeflash_output = std(arr, mask); result = codeflash_output # 6.89μs -> 7.11μs (3.09% slower)


def test_std_object_dtype():
    # Edge: object dtype, skipna=True
    arr = np.array([1, None, 3, 4], dtype=object)
    mask = np.array([False, True, False, False])
    # Only [1,3,4] used
    codeflash_output = np.std(np.array([1,3,4], dtype=object), ddof=1); expected = codeflash_output # 51.6μs -> 53.4μs (3.37% slower)
    codeflash_output = std(arr, mask); result = codeflash_output # 37.1μs -> 40.5μs (8.44% slower)

def test_std_axis_none_2d():
    # Edge: axis=None, 2D array, with missing
    arr = np.array([[1,2,3],[4,5,6]])
    mask = np.array([[False,True,False],[True,False,True]])
    # Only [1,3,5] used
    codeflash_output = np.std(np.array([1,3,5]), ddof=1); expected = codeflash_output # 35.4μs -> 34.9μs (1.62% faster)
    codeflash_output = std(arr, mask, axis=None); result = codeflash_output # 54.9μs -> 57.5μs (4.44% slower)

def test_std_axis0_all_missing_column():
    # Edge: axis=0, one column all missing
    arr = np.array([[1,2,3],[4,5,6]])
    mask = np.array([[False,True,False],[False,True,False]])
    # Second column is all missing, should be NA
    codeflash_output = std(arr, mask, axis=0); result = codeflash_output # 75.3μs -> 77.8μs (3.30% slower)
    # First column: [1,4], second: [], third: [3,6]
    expected = np.array([np.std([1,4], ddof=1), NA, np.std([3,6], ddof=1)], dtype=object) # 22.9μs -> 22.3μs (2.34% faster)

def test_std_axis1_all_missing_row():
    # Edge: axis=1, one row all missing
    arr = np.array([[1,2,3],[4,5,6]])
    mask = np.array([[True,True,True],[False,False,False]])
    codeflash_output = std(arr, mask, axis=1); result = codeflash_output # 70.5μs -> 73.1μs (3.65% slower)
    expected = np.array([NA, np.std([4,5,6], ddof=1)], dtype=object) # 22.8μs -> 22.1μs (3.18% faster)

def test_std_nan_in_values():
    # Edge: np.nan in values, mask marks as missing
    arr = np.array([1, np.nan, 3, 4])
    mask = np.array([False, True, False, False])
    codeflash_output = np.std(np.array([1,3,4]), ddof=1); expected = codeflash_output # 32.4μs -> 30.8μs (5.37% faster)
    codeflash_output = std(arr, mask); result = codeflash_output # 43.8μs -> 47.2μs (7.06% slower)

def test_std_all_zeroes():
    # Edge: all zeros, no missing
    arr = np.zeros(10)
    mask = np.array([False]*10)
    expected = 0.0
    codeflash_output = std(arr, mask); result = codeflash_output # 56.4μs -> 33.0μs (71.0% faster)

def test_std_negative_values():
    # Edge: negative values
    arr = np.array([-1,-2,-3,-4,-5])
    mask = np.array([False]*5)
    codeflash_output = np.std(arr, ddof=1); expected = codeflash_output # 32.1μs -> 31.8μs (0.905% faster)
    codeflash_output = std(arr, mask); result = codeflash_output # 43.7μs -> 17.2μs (153% faster)

def test_std_large_ddof():
    # Edge: ddof >= n
    arr = np.array([1,2,3])
    mask = np.array([False]*3)
    codeflash_output = std(arr, mask, ddof=5); result = codeflash_output # 64.9μs -> 57.0μs (13.9% faster)

# ----------- 3. Large Scale Test Cases ------------

def test_std_large_1d():
    # Large: 1D array, 1000 elements, no missing
    arr = np.arange(1000)
    mask = np.array([False]*1000)
    codeflash_output = np.std(arr, ddof=1); expected = codeflash_output # 35.8μs -> 36.5μs (1.73% slower)
    codeflash_output = std(arr, mask); result = codeflash_output # 47.3μs -> 20.3μs (133% faster)

def test_std_large_1d_half_missing():
    # Large: 1D array, 1000 elements, half missing
    arr = np.arange(1000)
    mask = np.zeros(1000, dtype=bool)
    mask[::2] = True  # every other is missing
    codeflash_output = np.std(arr[1::2], ddof=1); expected = codeflash_output # 33.4μs -> 32.0μs (4.14% faster)
    codeflash_output = std(arr, mask); result = codeflash_output # 58.4μs -> 65.6μs (11.1% slower)

def test_std_large_2d_axis0():
    # Large: 2D array, 100x10, axis=0, no missing
    arr = np.arange(1000).reshape(100,10)
    mask = np.zeros((100,10), dtype=bool)
    codeflash_output = np.std(arr, axis=0, ddof=1); expected = codeflash_output # 38.8μs -> 38.1μs (2.05% faster)
    codeflash_output = std(arr, mask, axis=0); result = codeflash_output # 62.5μs -> 25.1μs (149% faster)

def test_std_large_2d_axis1_with_missing():
    # Large: 2D array, 100x10, axis=1, 10% missing
    arr = np.arange(1000).reshape(100,10)
    mask = np.zeros((100,10), dtype=bool)
    # Mask out first column for all rows
    mask[:,0] = True
    codeflash_output = np.std(arr[:,1:], axis=1, ddof=1); expected = codeflash_output # 36.3μs -> 34.9μs (3.92% faster)
    codeflash_output = std(arr, mask, axis=1); result = codeflash_output # 61.2μs -> 68.5μs (10.7% slower)

def test_std_large_2d_all_missing_row():
    # Large: 2D array, 100x10, one row all missing
    arr = np.arange(1000).reshape(100,10)
    mask = np.zeros((100,10), dtype=bool)
    mask[50,:] = True  # row 50 all missing
    codeflash_output = std(arr, mask, axis=1); result = codeflash_output # 80.7μs -> 85.3μs (5.47% slower)
    # Check a non-missing row
    codeflash_output = np.std(arr[42,:], ddof=1); expected = codeflash_output # 22.8μs -> 22.0μs (3.30% faster)

def test_std_large_2d_all_missing_column():
    # Large: 2D array, 100x10, one column all missing
    arr = np.arange(1000).reshape(100,10)
    mask = np.zeros((100,10), dtype=bool)
    mask[:,7] = True  # column 7 all missing
    codeflash_output = std(arr, mask, axis=0); result = codeflash_output # 82.1μs -> 83.7μs (1.88% slower)
    # Check a non-missing column
    codeflash_output = np.std(arr[:,2], ddof=1); expected = codeflash_output # 22.0μs -> 21.8μs (0.899% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import warnings

import numpy as np
# imports
import pytest
from pandas.core.array_algos.masked_reductions import std


class NAType:
    """Minimal NA sentinel for testing purposes."""
    def __eq__(self, other):
        return isinstance(other, NAType)
    def __repr__(self):
        return "NA"

NA = NAType()
from pandas.core.array_algos.masked_reductions import std

# unit tests

# -----------------
# Basic Test Cases
# -----------------

def test_std_basic_no_missing():
    # Test with a simple array, no missing values
    arr = np.array([1.0, 2.0, 3.0, 4.0])
    mask = np.array([False, False, False, False])
    codeflash_output = np.std(arr, ddof=1); expected = codeflash_output # 30.7μs -> 31.1μs (1.32% slower)
    codeflash_output = std(arr, mask); result = codeflash_output # 44.3μs -> 17.0μs (161% faster)

def test_std_basic_with_missing_skipna():
    # Test with missing values, skipna=True (default)
    arr = np.array([1.0, 2.0, 3.0, 4.0])
    mask = np.array([False, True, False, False])
    codeflash_output = np.std([1.0, 3.0, 4.0], ddof=1); expected = codeflash_output # 29.7μs -> 28.6μs (3.63% faster)
    codeflash_output = std(arr, mask); result = codeflash_output # 42.3μs -> 48.4μs (12.6% slower)

def test_std_basic_with_missing_skipna_false():
    # Test with missing values, skipna=False
    arr = np.array([1.0, 2.0, 3.0, 4.0])
    mask = np.array([False, True, False, False])
    codeflash_output = std(arr, mask, skipna=False); result = codeflash_output # 15.7μs -> 22.8μs (31.4% slower)

def test_std_basic_ddof_0():
    # Test with ddof=0 (population std)
    arr = np.array([1.0, 2.0, 3.0, 4.0])
    mask = np.array([False, False, False, False])
    codeflash_output = np.std(arr, ddof=0); expected = codeflash_output # 30.9μs -> 32.0μs (3.55% slower)
    codeflash_output = std(arr, mask, ddof=0); result = codeflash_output # 43.3μs -> 16.8μs (158% faster)

def test_std_basic_integer_input():
    # Test with integer input
    arr = np.array([1, 2, 3, 4])
    mask = np.array([False, False, False, False])
    codeflash_output = np.std(arr, ddof=1); expected = codeflash_output # 32.3μs -> 31.7μs (1.72% faster)
    codeflash_output = std(arr, mask); result = codeflash_output # 43.7μs -> 17.7μs (146% faster)

def test_std_basic_object_dtype():
    # Test with object dtype, no missing
    arr = np.array([1.0, 2.0, 3.0, 4.0], dtype=object)
    mask = np.array([False, False, False, False])
    codeflash_output = np.std(arr, ddof=1); expected = codeflash_output # 38.7μs -> 37.5μs (3.10% faster)
    codeflash_output = std(arr, mask); result = codeflash_output # 32.8μs -> 21.5μs (52.4% faster)

def test_std_basic_object_dtype_with_missing():
    # Test with object dtype and missing
    arr = np.array([1.0, 2.0, 3.0, 4.0], dtype=object)
    mask = np.array([False, True, False, False])
    codeflash_output = np.std([1.0, 3.0, 4.0], ddof=1); expected = codeflash_output # 31.2μs -> 30.4μs (2.89% faster)
    codeflash_output = std(arr, mask); result = codeflash_output # 36.7μs -> 40.0μs (8.36% slower)

# -----------------
# Edge Test Cases
# -----------------

def test_std_empty_array():
    # Test with an empty array
    arr = np.array([])
    mask = np.array([], dtype=bool)
    codeflash_output = std(arr, mask); result = codeflash_output # 1.03μs -> 937ns (9.82% faster)

def test_std_all_missing():
    # Test with all values missing
    arr = np.array([1.0, 2.0, 3.0])
    mask = np.array([True, True, True])
    codeflash_output = std(arr, mask); result = codeflash_output # 8.80μs -> 9.35μs (5.84% slower)

def test_std_one_value():
    # Test with only one value, should return nan (ddof=1)
    arr = np.array([42.0])
    mask = np.array([False])
    codeflash_output = np.std(arr, ddof=1); expected = codeflash_output # 59.9μs -> 59.5μs (0.671% faster)
    codeflash_output = std(arr, mask); result = codeflash_output # 51.3μs -> 20.2μs (154% faster)

def test_std_one_value_ddof_0():
    # Test with only one value, ddof=0, should return 0.0
    arr = np.array([42.0])
    mask = np.array([False])
    expected = 0.0
    codeflash_output = std(arr, mask, ddof=0); result = codeflash_output # 58.4μs -> 33.8μs (72.8% faster)

def test_std_all_same_value():
    # Test with all values the same, std should be 0.0
    arr = np.array([5.0, 5.0, 5.0, 5.0])
    mask = np.array([False, False, False, False])
    expected = 0.0
    codeflash_output = std(arr, mask); result = codeflash_output # 57.5μs -> 34.2μs (68.1% faster)


def test_std_axis_0():
    # Test with 2D array, axis=0
    arr = np.array([[1.0, 2.0], [3.0, 4.0]])
    mask = np.array([[False, False], [False, False]])
    codeflash_output = np.std(arr, axis=0, ddof=1); expected = codeflash_output # 41.8μs -> 43.4μs (3.78% slower)
    codeflash_output = std(arr, mask, axis=0); result = codeflash_output # 58.7μs -> 20.4μs (187% faster)

def test_std_axis_1_with_missing():
    # Test with 2D array, axis=1, with missing values
    arr = np.array([[1.0, 2.0], [3.0, 4.0]])
    mask = np.array([[False, True], [False, False]])
    # First row: [1.0], std is nan (ddof=1), second row: [3.0, 4.0], std=0.7071...
    expected = np.array([np.nan, np.std([3.0, 4.0], ddof=1)]) # 34.6μs -> 33.7μs (2.91% faster)
    codeflash_output = std(arr, mask, axis=1); result = codeflash_output # 61.2μs -> 69.1μs (11.5% slower)

def test_std_object_dtype_all_missing():
    # Test with object dtype and all missing
    arr = np.array([1.0, 2.0], dtype=object)
    mask = np.array([True, True])
    codeflash_output = std(arr, mask); result = codeflash_output # 7.32μs -> 7.44μs (1.63% slower)

def test_std_object_dtype_empty():
    # Test with object dtype and empty array
    arr = np.array([], dtype=object)
    mask = np.array([], dtype=bool)
    codeflash_output = std(arr, mask); result = codeflash_output # 915ns -> 818ns (11.9% faster)

def test_std_axis_none_2d():
    # Test with 2D array and axis=None (flattened)
    arr = np.array([[1.0, 2.0], [3.0, 4.0]])
    mask = np.array([[False, False], [False, False]])
    codeflash_output = np.std(arr.flatten(), ddof=1); expected = codeflash_output # 37.2μs -> 37.4μs (0.644% slower)
    codeflash_output = std(arr, mask, axis=None); result = codeflash_output # 53.4μs -> 24.4μs (119% faster)

def test_std_axis_0_all_missing_col():
    # Test with 2D array, axis=0, one column all missing
    arr = np.array([[1.0, 2.0], [3.0, 4.0]])
    mask = np.array([[False, True], [False, True]])
    # First col: [1.0, 3.0], std=1.414..., second col: all missing, NA
    codeflash_output = std(arr, mask, axis=0); result = codeflash_output # 72.6μs -> 79.0μs (8.16% slower)

def test_std_axis_1_all_missing_row():
    # Test with 2D array, axis=1, one row all missing
    arr = np.array([[1.0, 2.0], [3.0, 4.0]])
    mask = np.array([[True, True], [False, False]])
    codeflash_output = std(arr, mask, axis=1); result = codeflash_output # 69.2μs -> 72.7μs (4.91% slower)

# -----------------
# Large Scale Test Cases
# -----------------

def test_std_large_array_no_missing():
    # Test with large array, no missing values
    arr = np.arange(1000, dtype=float)
    mask = np.zeros(1000, dtype=bool)
    codeflash_output = np.std(arr, ddof=1); expected = codeflash_output # 36.3μs -> 36.1μs (0.668% faster)
    codeflash_output = std(arr, mask); result = codeflash_output # 47.7μs -> 19.4μs (146% faster)

def test_std_large_array_some_missing():
    # Test with large array, some missing values
    arr = np.arange(1000, dtype=float)
    mask = np.zeros(1000, dtype=bool)
    mask[::10] = True  # every 10th value is missing
    codeflash_output = np.std(arr[~mask], ddof=1); expected = codeflash_output # 29.9μs -> 28.8μs (3.77% faster)
    codeflash_output = std(arr, mask); result = codeflash_output # 47.8μs -> 53.4μs (10.6% slower)

def test_std_large_2d_axis_0():
    # Test with large 2D array, axis=0
    arr = np.tile(np.arange(1000, dtype=float), (10,1))
    mask = np.zeros_like(arr, dtype=bool)
    codeflash_output = np.std(arr, axis=0, ddof=1); expected = codeflash_output # 45.8μs -> 44.7μs (2.28% faster)
    codeflash_output = std(arr, mask, axis=0); result = codeflash_output # 80.2μs -> 32.5μs (147% faster)

def test_std_large_2d_axis_1_with_missing():
    # Test with large 2D array, axis=1, with some missing values
    arr = np.tile(np.arange(1000, dtype=float), (10,1))
    mask = np.zeros_like(arr, dtype=bool)
    mask[:, ::10] = True  # every 10th column is missing in every row
    codeflash_output = np.std(arr[~mask].reshape(10, -1), axis=1, ddof=1); expected = codeflash_output # 40.2μs -> 39.6μs (1.45% faster)
    # For each row, remove the missing columns and compute std
    codeflash_output = std(arr, mask, axis=1); result = codeflash_output # 94.5μs -> 101μs (6.43% slower)
    # Each row should have the same std
    codeflash_output = np.std(np.arange(1000)[1:], ddof=1); expected_std = codeflash_output # 23.5μs -> 23.2μs (1.15% faster)

def test_std_large_object_dtype():
    # Test with large object dtype array, no missing
    arr = np.arange(1000, dtype=object)
    mask = np.zeros(1000, dtype=bool)
    codeflash_output = np.std(arr, ddof=1); expected = codeflash_output # 142μs -> 142μs (0.285% slower)
    codeflash_output = std(arr, mask); result = codeflash_output # 135μs -> 120μs (12.9% faster)

def test_std_large_object_dtype_some_missing():
    # Test with large object dtype array, some missing
    arr = np.arange(1000, dtype=object)
    mask = np.zeros(1000, dtype=bool)
    mask[::10] = True
    codeflash_output = np.std(arr[~mask], ddof=1); expected = codeflash_output # 124μs -> 121μs (2.23% faster)
    codeflash_output = std(arr, mask); result = codeflash_output # 120μs -> 124μs (2.99% slower)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-std-mhbksx6n and push.

Codeflash

The optimized code achieves a 14% speedup through three key optimizations that reduce unnecessary computations when masks have no missing values:

**1. Early mask.any() checks in _reductions()**
- **What**: Added `mask.any()` checks before expensive operations to avoid work when all values are valid
- **Why faster**: For arrays with no missing values (common case), this avoids array indexing (`values[~mask]`) and creating inverted masks (`~mask`) in the `where=` parameter
- **Best for**: Test cases with no missing values show 136-187% speedups (e.g., `test_std_basic_no_missing`, `test_std_large_array_no_missing`)

**2. Fast path in std() function**  
- **What**: Added `if not mask.any(): return np.std(values, axis=axis, ddof=ddof)` before expensive warning handling
- **Why faster**: Bypasses the entire `warnings.catch_warnings()` context manager and `_reductions()` call when no masking is needed
- **Best for**: Simple arrays with no missing values get direct numpy computation, showing 52-161% improvements

**3. Object dtype optimization**
- **What**: In object dtype handling, check `mask.any()` before array slicing to avoid creating new arrays when unnecessary
- **Why faster**: Prevents allocation of `values[~mask]` when all values are valid, using original array directly
- **Best for**: Object dtype arrays without missing values show 12-52% improvements

The optimizations are most effective for arrays with no or few missing values, which are common in real-world data. Arrays with many missing values show smaller improvements (or slight regressions due to additional checks), but the overall performance gain comes from optimizing the common case of clean data.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 29, 2025 05:49
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Oct 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant