Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 29, 2025

📄 76% (0.76x) speedup for has_level_label in pandas/plotting/_matplotlib/converter.py

⏱️ Runtime : 114 microseconds 64.5 microseconds (best of 220 runs)

📝 Explanation and details

The optimized code achieves a 76% speedup through three key optimizations:

1. Eliminated redundant size calculations: The original code called label_flags.size twice in the complex conditional. The optimized version stores it once in a local variable, reducing NumPy attribute access overhead.

2. Restructured branching logic: Instead of a complex compound conditional with short-circuit evaluation, the optimized version uses sequential if statements that handle edge cases first (empty arrays, single-element arrays), then falls through to the common case. This reduces the total number of condition evaluations for most inputs.

3. Used .item() for single-element access: When dealing with single-element arrays, .item() is faster than indexing ([0]) because it directly extracts the scalar value without array indexing overhead or potential memory copying.

The optimization is particularly effective for the most common test cases:

  • Single-element arrays with special conditions (75-104% faster): These benefit most from avoiding repeated size calculations and using .item()
  • Multi-element arrays (17-36% faster): These benefit from the streamlined branching logic
  • Edge cases with complex vmin values (50-95% faster): The restructured logic processes these more efficiently

The performance gains are consistent across different array sizes and data types, with the largest improvements seen in single-element array scenarios where the original code's redundant operations had the most impact.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 64 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import numpy as np
# imports
import pytest
from pandas.plotting._matplotlib.converter import has_level_label

# unit tests

# -------------------------------
# Basic Test Cases
# -------------------------------

def test_single_label_present():
    # Basic: Single element, label present, vmin integer
    arr = np.array([1], dtype=int)
    vmin = 0.0
    codeflash_output = has_level_label(arr, vmin) # 1.94μs -> 879ns (120% faster)

def test_multiple_labels_present():
    # Basic: Multiple elements, at least one label present
    arr = np.array([0, 1, 0, 0], dtype=int)
    vmin = 0.0
    codeflash_output = has_level_label(arr, vmin) # 788ns -> 627ns (25.7% faster)

def test_all_labels_present():
    # Basic: All elements are labels
    arr = np.array([1, 1, 1], dtype=int)
    vmin = 2.0
    codeflash_output = has_level_label(arr, vmin) # 801ns -> 613ns (30.7% faster)

def test_single_label_absent_vmin_integer():
    # Basic: Single element, label absent, vmin integer
    arr = np.array([0], dtype=int)
    vmin = 1.0
    codeflash_output = has_level_label(arr, vmin) # 2.82μs -> 1.46μs (93.7% faster)

def test_single_label_absent_vmin_noninteger():
    # Basic: Single element, label absent, vmin not integer
    arr = np.array([0], dtype=int)
    vmin = 1.5
    codeflash_output = has_level_label(arr, vmin) # 2.31μs -> 1.32μs (75.0% faster)

def test_empty_array():
    # Basic: Empty array should always return False
    arr = np.array([], dtype=int)
    vmin = 0.0
    codeflash_output = has_level_label(arr, vmin) # 642ns -> 595ns (7.90% faster)

# -------------------------------
# Edge Test Cases
# -------------------------------

def test_vmin_exactly_integer_with_single_zero():
    # Edge: vmin is exactly integer, single zero label
    arr = np.array([0], dtype=int)
    vmin = 2.0
    codeflash_output = has_level_label(arr, vmin) # 2.68μs -> 1.47μs (82.0% faster)

def test_vmin_very_close_to_integer_but_not_integer():
    # Edge: vmin is almost integer but not quite
    arr = np.array([0], dtype=int)
    vmin = 3.00000000001
    codeflash_output = has_level_label(arr, vmin) # 2.27μs -> 1.32μs (72.3% faster)

def test_vmin_very_close_to_next_integer_from_below():
    # Edge: vmin is just below an integer
    arr = np.array([0], dtype=int)
    vmin = 2.99999999999
    codeflash_output = has_level_label(arr, vmin) # 2.30μs -> 1.35μs (70.8% faster)

def test_vmin_negative_noninteger():
    # Edge: vmin is negative and not integer
    arr = np.array([0], dtype=int)
    vmin = -1.25
    codeflash_output = has_level_label(arr, vmin) # 2.25μs -> 1.31μs (71.5% faster)

def test_vmin_negative_integer():
    # Edge: vmin is negative integer
    arr = np.array([0], dtype=int)
    vmin = -2.0
    codeflash_output = has_level_label(arr, vmin) # 2.51μs -> 1.46μs (71.4% faster)

def test_label_flags_with_negative_values():
    # Edge: label_flags contains negative values (should be treated as present)
    arr = np.array([0, -1, 0], dtype=int)
    vmin = 0.0
    codeflash_output = has_level_label(arr, vmin) # 825ns -> 609ns (35.5% faster)

def test_label_flags_all_zero_multiple():
    # Edge: Multiple zeros, vmin non-integer
    arr = np.zeros(5, dtype=int)
    vmin = 0.5
    codeflash_output = has_level_label(arr, vmin) # 788ns -> 645ns (22.2% faster)

def test_label_flags_all_zero_multiple_vmin_integer():
    # Edge: Multiple zeros, vmin integer
    arr = np.zeros(5, dtype=int)
    vmin = 2.0
    codeflash_output = has_level_label(arr, vmin) # 800ns -> 656ns (22.0% faster)

def test_label_flags_dtype_float():
    # Edge: label_flags as float dtype
    arr = np.array([0.0, 1.0], dtype=float)
    vmin = 0.0
    codeflash_output = has_level_label(arr, vmin) # 794ns -> 597ns (33.0% faster)

def test_label_flags_dtype_bool():
    # Edge: label_flags as bool dtype
    arr = np.array([False, True], dtype=bool)
    vmin = 0.0
    codeflash_output = has_level_label(arr, vmin) # 777ns -> 634ns (22.6% faster)

def test_label_flags_nan():
    # Edge: label_flags contains np.nan (should not be considered a label)
    arr = np.array([0, np.nan], dtype=float)
    vmin = 0.0
    # np.nan != 0, so has_level_label should treat this as a label present
    codeflash_output = has_level_label(arr, vmin) # 781ns -> 639ns (22.2% faster)

def test_label_flags_large_negative_vmin():
    # Edge: vmin is a large negative non-integer
    arr = np.array([0], dtype=int)
    vmin = -9999.123
    codeflash_output = has_level_label(arr, vmin) # 3.13μs -> 1.60μs (95.8% faster)

def test_label_flags_large_positive_vmin():
    # Edge: vmin is a large positive non-integer
    arr = np.array([0], dtype=int)
    vmin = 1e6 + 0.5
    codeflash_output = has_level_label(arr, vmin) # 2.24μs -> 1.42μs (58.6% faster)

def test_label_flags_large_positive_vmin_integer():
    # Edge: vmin is a large positive integer
    arr = np.array([0], dtype=int)
    vmin = 1e6
    codeflash_output = has_level_label(arr, vmin) # 2.27μs -> 1.49μs (52.6% faster)

# -------------------------------
# Large Scale Test Cases
# -------------------------------

def test_large_array_all_labels():
    # Large: All elements are labels
    arr = np.ones(1000, dtype=int)
    vmin = 0.0
    codeflash_output = has_level_label(arr, vmin) # 788ns -> 672ns (17.3% faster)

def test_large_array_no_labels_vmin_integer():
    # Large: All elements are zero, vmin integer
    arr = np.zeros(1000, dtype=int)
    vmin = 10.0
    codeflash_output = has_level_label(arr, vmin) # 824ns -> 643ns (28.1% faster)

def test_large_array_no_labels_vmin_noninteger():
    # Large: All elements are zero, vmin non-integer
    arr = np.zeros(1000, dtype=int)
    vmin = 10.5
    codeflash_output = has_level_label(arr, vmin) # 839ns -> 686ns (22.3% faster)

def test_large_array_one_label_present():
    # Large: Only one label present in large array
    arr = np.zeros(1000, dtype=int)
    arr[500] = 1
    vmin = 0.0
    codeflash_output = has_level_label(arr, vmin) # 794ns -> 653ns (21.6% faster)

def test_large_array_single_element_zero_vmin_noninteger():
    # Large: Single element, zero, vmin non-integer
    arr = np.zeros(1, dtype=int)
    vmin = 0.1
    codeflash_output = has_level_label(arr, vmin) # 3.07μs -> 1.51μs (104% faster)

def test_large_array_single_element_zero_vmin_integer():
    # Large: Single element, zero, vmin integer
    arr = np.zeros(1, dtype=int)
    vmin = 1000.0
    codeflash_output = has_level_label(arr, vmin) # 2.54μs -> 1.51μs (68.8% faster)

def test_large_array_empty():
    # Large: Empty array
    arr = np.array([], dtype=int)
    vmin = 0.0
    codeflash_output = has_level_label(arr, vmin) # 661ns -> 595ns (11.1% faster)

def test_large_array_random_labels():
    # Large: Random 0/1 labels, at least one label present
    rng = np.random.default_rng(42)
    arr = rng.integers(0, 2, size=1000)
    vmin = 0.0
    codeflash_output = has_level_label(arr, vmin) # 841ns -> 677ns (24.2% faster)

# -------------------------------
# Additional Robustness Tests
# -------------------------------

def test_non_integer_label_flags():
    # Edge: label_flags contains non-integer values
    arr = np.array([0.0, 2.5, 0.0], dtype=float)
    vmin = 0.0
    codeflash_output = has_level_label(arr, vmin) # 819ns -> 628ns (30.4% faster)

def test_label_flags_object_dtype():
    # Edge: label_flags of object dtype
    arr = np.array([0, 1, 0], dtype=object)
    vmin = 0.0
    codeflash_output = has_level_label(arr, vmin) # 817ns -> 653ns (25.1% faster)

def test_label_flags_all_zero_object_dtype_single():
    # Edge: object dtype, single zero, vmin non-integer
    arr = np.array([0], dtype=object)
    vmin = 0.1
    codeflash_output = has_level_label(arr, vmin) # 2.13μs -> 1.65μs (29.4% faster)

def test_label_flags_with_large_integers():
    # Edge: label_flags with large integer values
    arr = np.array([0, 0, 999999999], dtype=int)
    vmin = 0.0
    codeflash_output = has_level_label(arr, vmin) # 797ns -> 636ns (25.3% faster)

def test_label_flags_with_inf():
    # Edge: label_flags contains np.inf
    arr = np.array([0, np.inf], dtype=float)
    vmin = 0.0
    codeflash_output = has_level_label(arr, vmin) # 799ns -> 655ns (22.0% faster)

def test_label_flags_with_minus_inf():
    # Edge: label_flags contains -np.inf
    arr = np.array([0, -np.inf], dtype=float)
    vmin = 0.0
    codeflash_output = has_level_label(arr, vmin) # 773ns -> 655ns (18.0% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import numpy as np
# imports
import pytest
from pandas.plotting._matplotlib.converter import has_level_label

# unit tests

# -------------------- BASIC TEST CASES --------------------

def test_basic_nonzero_label_flags():
    # At least one label present, should return True
    arr = np.array([1, 0, 0], dtype=np.intp)
    codeflash_output = has_level_label(arr, 0.0) # 978ns -> 717ns (36.4% faster)

def test_basic_zero_label_flags_with_integer_vmin():
    # Only one flag, it's zero, vmin is integer, should return True
    arr = np.array([0], dtype=np.intp)
    codeflash_output = has_level_label(arr, 0.0) # 3.10μs -> 1.62μs (92.1% faster)

def test_basic_zero_label_flags_with_noninteger_vmin():
    # Only one flag, it's zero, vmin is not integer, should return False
    arr = np.array([0], dtype=np.intp)
    codeflash_output = has_level_label(arr, 0.5) # 2.43μs -> 1.36μs (78.1% faster)

def test_basic_empty_array():
    # Empty array, should return False
    arr = np.array([], dtype=np.intp)
    codeflash_output = has_level_label(arr, 0.0) # 673ns -> 574ns (17.2% faster)

def test_basic_all_nonzero_flags():
    # All flags are nonzero, should return True
    arr = np.array([1, 1, 1], dtype=np.intp)
    codeflash_output = has_level_label(arr, 1.0) # 798ns -> 615ns (29.8% faster)

# -------------------- EDGE TEST CASES --------------------

def test_edge_single_zero_flag_vmin_exactly_integer():
    # Single zero flag, vmin is exactly integer, should return True
    arr = np.array([0], dtype=np.intp)
    codeflash_output = has_level_label(arr, 2.0) # 2.87μs -> 1.55μs (85.1% faster)

def test_edge_single_zero_flag_vmin_very_close_to_integer():
    # Single zero flag, vmin is very close to integer but not exactly
    arr = np.array([0], dtype=np.intp)
    codeflash_output = has_level_label(arr, 2.0000000001) # 2.43μs -> 1.35μs (79.3% faster)

def test_edge_single_zero_flag_vmin_very_close_below_integer():
    # Single zero flag, vmin is just below an integer
    arr = np.array([0], dtype=np.intp)
    codeflash_output = has_level_label(arr, 2.9999999999) # 2.37μs -> 1.31μs (81.1% faster)

def test_edge_single_nonzero_flag_vmin_noninteger():
    # Single nonzero flag, vmin is not integer, should return True
    arr = np.array([1], dtype=np.intp)
    codeflash_output = has_level_label(arr, 0.7) # 1.95μs -> 943ns (107% faster)

def test_edge_multiple_flags_all_zero():
    # Multiple flags, all zero, should return True (not the special case)
    arr = np.array([0, 0, 0], dtype=np.intp)
    codeflash_output = has_level_label(arr, 0.5) # 788ns -> 583ns (35.2% faster)

def test_edge_large_vmin_negative():
    # Negative vmin, single zero flag, should return True if vmin is integer
    arr = np.array([0], dtype=np.intp)
    codeflash_output = has_level_label(arr, -3.0) # 2.73μs -> 1.50μs (81.5% faster)

def test_edge_large_vmin_negative_noninteger():
    # Negative vmin, single zero flag, should return False if vmin is not integer
    arr = np.array([0], dtype=np.intp)
    codeflash_output = has_level_label(arr, -3.5) # 2.28μs -> 1.31μs (73.4% faster)

def test_edge_label_flags_with_negative_values():
    # Negative values in label_flags (although not expected), should still work
    arr = np.array([-1, 0, 1], dtype=np.intp)
    codeflash_output = has_level_label(arr, 0.0) # 773ns -> 620ns (24.7% faster)

def test_edge_label_flags_dtype_other_than_intp():
    # label_flags with dtype int32 (should still work)
    arr = np.array([0], dtype=np.int32)
    codeflash_output = has_level_label(arr, 0.5) # 2.65μs -> 1.43μs (86.1% faster)

def test_edge_vmin_is_nan():
    # vmin is NaN; vmin % 1 should be NaN, so condition is never True, should return True
    arr = np.array([0], dtype=np.intp)
    codeflash_output = has_level_label(arr, float('nan')) # 2.43μs -> 1.34μs (81.6% faster)

def test_edge_vmin_is_inf():
    # vmin is inf; vmin % 1 is nan, so condition is never True, should return True
    arr = np.array([0], dtype=np.intp)
    codeflash_output = has_level_label(arr, float('inf')) # 2.52μs -> 1.51μs (67.6% faster)

def test_edge_label_flags_large_positive_and_negative():
    # Large positive and negative values in label_flags
    arr = np.array([2**30, -2**30], dtype=np.intp)
    codeflash_output = has_level_label(arr, 0.0) # 778ns -> 605ns (28.6% faster)

# -------------------- LARGE SCALE TEST CASES --------------------

def test_large_scale_all_zeros():
    # Large array of zeros, should return True (not the special case)
    arr = np.zeros(1000, dtype=np.intp)
    codeflash_output = has_level_label(arr, 0.5) # 891ns -> 654ns (36.2% faster)

def test_large_scale_one_nonzero():
    # Large array, only one nonzero value
    arr = np.zeros(1000, dtype=np.intp)
    arr[500] = 1
    codeflash_output = has_level_label(arr, 0.5) # 818ns -> 689ns (18.7% faster)

def test_large_scale_empty():
    # Large scale: empty array (edge of large scale)
    arr = np.array([], dtype=np.intp)
    codeflash_output = has_level_label(arr, 0.0) # 744ns -> 625ns (19.0% faster)

def test_large_scale_single_zero_noninteger_vmin():
    # Single zero, non-integer vmin, should return False (special case)
    arr = np.zeros(1, dtype=np.intp)
    codeflash_output = has_level_label(arr, 123.456) # 3.08μs -> 1.62μs (89.3% faster)

def test_large_scale_performance_mixed():
    # Large array with a mix of zeros and ones
    arr = np.zeros(1000, dtype=np.intp)
    arr[::2] = 1  # Set every other element to 1
    codeflash_output = has_level_label(arr, 0.0) # 847ns -> 672ns (26.0% faster)

def test_large_scale_performance_all_nonzero():
    # Large array, all elements are nonzero
    arr = np.ones(1000, dtype=np.intp)
    codeflash_output = has_level_label(arr, 0.0) # 825ns -> 692ns (19.2% faster)

def test_large_scale_performance_all_zero_single_element_noninteger_vmin():
    # Single element, zero, non-integer vmin
    arr = np.zeros(1, dtype=np.intp)
    codeflash_output = has_level_label(arr, 0.9999) # 2.95μs -> 1.51μs (95.6% faster)

def test_large_scale_performance_all_zero_single_element_integer_vmin():
    # Single element, zero, integer vmin
    arr = np.zeros(1, dtype=np.intp)
    codeflash_output = has_level_label(arr, 10.0) # 2.45μs -> 1.43μs (72.1% faster)

# -------------------- ADDITIONAL EDGE CASES --------------------

def test_edge_label_flags_object_dtype():
    # label_flags with dtype object (should still work if elements are ints)
    arr = np.array([0], dtype=object)
    codeflash_output = has_level_label(arr, 0.5) # 1.78μs -> 1.39μs (28.7% faster)

def test_edge_label_flags_bool_dtype():
    # label_flags with dtype bool (should treat True as 1, False as 0)
    arr = np.array([False], dtype=bool)
    codeflash_output = has_level_label(arr, 0.5) # 11.1μs -> 1.44μs (673% faster)
    arr = np.array([True], dtype=bool)
    codeflash_output = has_level_label(arr, 0.5) # 2.24μs -> 452ns (396% faster)

def test_edge_label_flags_float_dtype():
    # label_flags with dtype float (should treat 0.0 as 0, 1.0 as 1)
    arr = np.array([0.0], dtype=float)
    codeflash_output = has_level_label(arr, 0.5) # 2.54μs -> 1.44μs (76.2% faster)
    arr = np.array([1.0], dtype=float)
    codeflash_output = has_level_label(arr, 0.5) # 604ns -> 495ns (22.0% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-has_level_label-mhbpk08s and push.

Codeflash

The optimized code achieves a 76% speedup through three key optimizations:

**1. Eliminated redundant size calculations**: The original code called `label_flags.size` twice in the complex conditional. The optimized version stores it once in a local variable, reducing NumPy attribute access overhead.

**2. Restructured branching logic**: Instead of a complex compound conditional with short-circuit evaluation, the optimized version uses sequential `if` statements that handle edge cases first (empty arrays, single-element arrays), then falls through to the common case. This reduces the total number of condition evaluations for most inputs.

**3. Used `.item()` for single-element access**: When dealing with single-element arrays, `.item()` is faster than indexing (`[0]`) because it directly extracts the scalar value without array indexing overhead or potential memory copying.

The optimization is particularly effective for the most common test cases:
- **Single-element arrays with special conditions** (75-104% faster): These benefit most from avoiding repeated size calculations and using `.item()`
- **Multi-element arrays** (17-36% faster): These benefit from the streamlined branching logic
- **Edge cases with complex vmin values** (50-95% faster): The restructured logic processes these more efficiently

The performance gains are consistent across different array sizes and data types, with the largest improvements seen in single-element array scenarios where the original code's redundant operations had the most impact.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 29, 2025 08:02
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Oct 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant