Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 29, 2025

📄 11% (0.11x) speedup for _is_metadata_of in pandas/io/pytables.py

⏱️ Runtime : 796 microseconds 719 microseconds (best of 5 runs)

📝 Explanation and details

The optimized code achieves a 10% speedup through one key structural change: early termination when the parent matches but the name is wrong.

What changed:

  • Split the compound condition if parent == parent_group and current._v_name == "meta" into nested conditions
  • Added an explicit return False when parent == parent_group but current._v_name != "meta"

Why this improves performance:
In the original code, when a matching parent is found but the node name isn't "meta", the function continues traversing up the tree unnecessarily. The optimized version immediately returns False in this case, avoiding further tree traversal.

Performance characteristics from test results:

  • Best gains (20-50% faster): Cases where nodes have the correct parent relationship but wrong names (e.g., "data", "notmeta", etc.)
  • Neutral/slight regression (1-15% slower): Cases where the node name is "meta" and should return True, due to the additional nested condition check
  • Good gains on large scale (1-12% faster): Deep tree traversals where early termination saves significant work

The optimization is particularly effective for workloads with many non-metadata nodes that share parent relationships with the target parent group, as it eliminates unnecessary tree walking once the parent match is confirmed but the name check fails.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 1656 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from __future__ import annotations

# imports
import pytest
from pandas.io.pytables import _is_metadata_of


class Node:
    """Minimal mock of tables.Node for unit testing."""
    def __init__(self, name, depth, parent=None):
        self._v_name = name
        self._v_depth = depth
        self._v_parent = parent
from pandas.io.pytables import _is_metadata_of

# unit tests

# --- Basic Test Cases ---

def test_basic_true_case():
    # group is a direct child of parent_group, named "meta"
    parent = Node("parent", 1)
    group = Node("meta", 2, parent)
    codeflash_output = _is_metadata_of(group, parent) # 882ns -> 1.03μs (14.6% slower)

def test_basic_false_wrong_name():
    # group is a direct child but not named "meta"
    parent = Node("parent", 1)
    group = Node("notmeta", 2, parent)
    codeflash_output = _is_metadata_of(group, parent) # 1.00μs -> 846ns (18.8% faster)

def test_basic_false_not_child():
    # group is not a child of parent_group
    parent = Node("parent", 1)
    other_parent = Node("other", 1)
    group = Node("meta", 2, other_parent)
    codeflash_output = _is_metadata_of(group, parent) # 949ns -> 1.16μs (18.0% slower)

def test_basic_false_same_depth():
    # group and parent_group have same depth
    parent = Node("parent", 2)
    group = Node("meta", 2, parent)
    codeflash_output = _is_metadata_of(group, parent) # 495ns -> 562ns (11.9% slower)

def test_basic_true_grandchild_meta():
    # group is a grandchild, but "meta" node is directly under parent_group
    parent = Node("parent", 1)
    meta = Node("meta", 2, parent)
    group = Node("something", 3, meta)
    # Should be False, as group itself is not "meta"
    codeflash_output = _is_metadata_of(group, parent) # 888ns -> 1.05μs (15.5% slower)

# --- Edge Test Cases ---

def test_edge_depth_less_than_parent():
    # group has lower depth than parent_group
    parent = Node("parent", 2)
    group = Node("meta", 1, parent)
    codeflash_output = _is_metadata_of(group, parent) # 571ns -> 600ns (4.83% slower)

def test_edge_group_is_root():
    # group is at depth 1 (root)
    root = Node("root", 1)
    codeflash_output = _is_metadata_of(root, root) # 453ns -> 540ns (16.1% slower)


def test_edge_group_is_none():
    # group is None
    parent = Node("parent", 1)
    with pytest.raises(AttributeError):
        _is_metadata_of(None, parent) # 1.51μs -> 1.49μs (1.75% faster)

def test_edge_circular_reference():
    # group and parent_group reference each other (cycle)
    parent = Node("parent", 1)
    group = Node("meta", 2, parent)
    parent._v_parent = group  # create a cycle
    codeflash_output = _is_metadata_of(group, parent) # 1.03μs -> 1.08μs (4.90% slower)

def test_edge_meta_name_case_sensitive():
    # "Meta" vs "meta" should be case-sensitive
    parent = Node("parent", 1)
    group = Node("Meta", 2, parent)
    codeflash_output = _is_metadata_of(group, parent) # 1.07μs -> 934ns (14.7% faster)

def test_edge_multiple_meta_levels():
    # Multiple "meta" nodes in ancestry, only direct child matters
    parent = Node("parent", 1)
    meta1 = Node("meta", 2, parent)
    meta2 = Node("meta", 3, meta1)
    codeflash_output = _is_metadata_of(meta2, parent) # 1.13μs -> 1.19μs (4.64% slower)
    codeflash_output = _is_metadata_of(meta1, parent) # 431ns -> 424ns (1.65% faster)

def test_edge_parent_is_self():
    # parent_group is the same as group
    group = Node("meta", 2)
    codeflash_output = _is_metadata_of(group, group) # 577ns -> 587ns (1.70% slower)

def test_edge_parent_is_grandparent():
    # group is child of child of parent_group, but not direct meta
    grandparent = Node("grandparent", 1)
    parent = Node("parent", 2, grandparent)
    group = Node("meta", 3, parent)
    codeflash_output = _is_metadata_of(group, grandparent) # 1.22μs -> 1.10μs (11.3% faster)

def test_edge_name_is_empty_string():
    # group named "" (empty string)
    parent = Node("parent", 1)
    group = Node("", 2, parent)
    codeflash_output = _is_metadata_of(group, parent) # 953ns -> 937ns (1.71% faster)

# --- Large Scale Test Cases ---

def test_large_scale_many_meta_nodes():
    # Create a chain of 1000 nodes, only one is "meta" and direct child of parent
    parent = Node("parent", 1)
    prev = parent
    meta_node = None
    for i in range(2, 1002):
        name = "meta" if i == 500 else f"node{i}"
        node = Node(name, i, prev)
        prev = node
        if name == "meta":
            meta_node = node
    # Only the meta_node should return True
    codeflash_output = _is_metadata_of(meta_node, parent) # 25.3μs -> 25.2μs (0.480% faster)
    # A non-meta node should return False
    codeflash_output = _is_metadata_of(prev, parent) # 48.8μs -> 48.3μs (1.08% faster)

def test_large_scale_all_non_meta_nodes():
    # Chain of 1000 nodes, none named "meta"
    parent = Node("parent", 1)
    prev = parent
    for i in range(2, 1002):
        node = Node(f"node{i}", i, prev)
        prev = node
    codeflash_output = _is_metadata_of(prev, parent) # 48.8μs -> 45.7μs (6.83% faster)

def test_large_scale_meta_at_end():
    # "meta" node at the deepest level, but parent is not direct parent
    parent = Node("parent", 1)
    prev = parent
    for i in range(2, 1000):
        node = Node(f"node{i}", i, prev)
        prev = node
    meta_node = Node("meta", 1000, prev)
    # meta_node's parent is not parent_group
    codeflash_output = _is_metadata_of(meta_node, parent) # 50.5μs -> 48.6μs (3.80% faster)

def test_large_scale_multiple_parents():
    # Multiple parent nodes, only correct parent should match
    parents = [Node(f"parent{i}", 1) for i in range(10)]
    meta_nodes = [Node("meta", 2, p) for p in parents]
    for i, meta_node in enumerate(meta_nodes):
        for j, parent in enumerate(parents):
            expected = i == j
            codeflash_output = _is_metadata_of(meta_node, parent)

def test_large_scale_deep_tree_meta():
    # Deep tree, meta node at depth 999, parent at depth 998
    parent = Node("parent", 998)
    meta = Node("meta", 999, parent)
    codeflash_output = _is_metadata_of(meta, parent) # 750ns -> 857ns (12.5% slower)

def test_large_scale_deep_tree_non_meta():
    # Deep tree, node at depth 999, not named "meta"
    parent = Node("parent", 998)
    node = Node("notmeta", 999, parent)
    codeflash_output = _is_metadata_of(node, parent)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from __future__ import annotations

# imports
import pytest
from pandas.io.pytables import _is_metadata_of


class Node:
    """Minimal mock of tables.Node for testing purposes."""
    def __init__(self, name, depth, parent=None):
        self._v_name = name
        self._v_depth = depth
        self._v_parent = parent
from pandas.io.pytables import _is_metadata_of

# unit tests

# --- Basic Test Cases ---

def test_basic_true_direct_child_meta():
    # group is a direct child of parent_group named "meta"
    parent = Node("parent", 1)
    meta = Node("meta", 2, parent)
    codeflash_output = _is_metadata_of(meta, parent) # 1.29μs -> 1.28μs (1.17% faster)

def test_basic_false_direct_child_not_meta():
    # group is a direct child of parent_group but not named "meta"
    parent = Node("parent", 1)
    child = Node("data", 2, parent)
    codeflash_output = _is_metadata_of(child, parent) # 1.17μs -> 968ns (20.6% faster)

def test_basic_true_nested_meta():
    # group is a nested child of parent_group named "meta"
    parent = Node("parent", 1)
    intermediate = Node("intermediate", 2, parent)
    meta = Node("meta", 3, intermediate)
    codeflash_output = _is_metadata_of(meta, intermediate) # 885ns -> 914ns (3.17% slower)

def test_basic_false_nested_not_meta():
    # group is a nested child but not named "meta"
    parent = Node("parent", 1)
    intermediate = Node("intermediate", 2, parent)
    child = Node("data", 3, intermediate)
    codeflash_output = _is_metadata_of(child, intermediate) # 1.22μs -> 863ns (41.3% faster)

def test_basic_false_wrong_parent():
    # group is named "meta" but parent_group is not its parent
    parent1 = Node("parent1", 1)
    parent2 = Node("parent2", 1)
    meta = Node("meta", 2, parent1)
    codeflash_output = _is_metadata_of(meta, parent2) # 895ns -> 1.06μs (15.4% slower)

# --- Edge Test Cases ---

def test_edge_same_depth():
    # group and parent_group have the same depth
    parent = Node("parent", 2)
    group = Node("meta", 2, parent)
    codeflash_output = _is_metadata_of(group, parent) # 609ns -> 608ns (0.164% faster)

def test_edge_group_depth_less_than_parent():
    # group is at a shallower depth than parent_group
    parent = Node("parent", 2)
    group = Node("meta", 1, parent)
    codeflash_output = _is_metadata_of(group, parent) # 632ns -> 604ns (4.64% faster)

def test_edge_group_is_root():
    # group is the root node (depth 1)
    root = Node("root", 1)
    codeflash_output = _is_metadata_of(root, root) # 580ns -> 593ns (2.19% slower)

def test_edge_parent_is_root_meta_child():
    # parent_group is root, group is a meta child
    root = Node("root", 1)
    meta = Node("meta", 2, root)
    codeflash_output = _is_metadata_of(meta, root) # 821ns -> 890ns (7.75% slower)



def test_edge_meta_name_case_sensitive():
    # Name "Meta" (capitalized) should not match
    parent = Node("parent", 1)
    meta = Node("Meta", 2, parent)
    codeflash_output = _is_metadata_of(meta, parent) # 1.20μs -> 1.19μs (0.927% faster)

def test_edge_meta_name_with_spaces():
    # Name "meta " (with space) should not match
    parent = Node("parent", 1)
    meta = Node("meta ", 2, parent)
    codeflash_output = _is_metadata_of(meta, parent) # 1.04μs -> 953ns (9.23% faster)

def test_edge_meta_name_empty_string():
    # Name "" should not match
    parent = Node("parent", 1)
    meta = Node("", 2, parent)
    codeflash_output = _is_metadata_of(meta, parent) # 1.05μs -> 952ns (10.3% faster)

def test_edge_deeply_nested_meta():
    # meta group is several levels deep, but parent_group is not its parent
    root = Node("root", 1)
    a = Node("a", 2, root)
    b = Node("b", 3, a)
    meta = Node("meta", 4, b)
    # parent_group is 'a', but meta's parent is 'b', so should be False
    codeflash_output = _is_metadata_of(meta, a) # 1.31μs -> 1.12μs (17.1% faster)

def test_edge_meta_is_not_direct_child():
    # meta group is a grandchild, but not a direct child of parent_group
    parent = Node("parent", 1)
    intermediate = Node("intermediate", 2, parent)
    meta = Node("meta", 3, intermediate)
    # Should be True only if parent_group is intermediate
    codeflash_output = _is_metadata_of(meta, parent) # 1.08μs -> 1.19μs (8.60% slower)
    codeflash_output = _is_metadata_of(meta, intermediate) # 522ns -> 542ns (3.69% slower)

def test_edge_meta_with_multiple_parents():
    # Simulate two meta nodes with different parents
    parent1 = Node("parent1", 1)
    parent2 = Node("parent2", 1)
    meta1 = Node("meta", 2, parent1)
    meta2 = Node("meta", 2, parent2)
    codeflash_output = _is_metadata_of(meta1, parent1) # 733ns -> 879ns (16.6% slower)
    codeflash_output = _is_metadata_of(meta2, parent1) # 602ns -> 579ns (3.97% faster)

# --- Large Scale Test Cases ---

def test_large_scale_many_levels_true():
    # Create a chain of 1000 nodes, last is "meta" and direct child of node 999
    nodes = [Node(str(i), i+1) for i in range(999)]
    parent = Node("parent", 1)
    nodes[0]._v_parent = parent
    for i in range(1, 999):
        nodes[i]._v_parent = nodes[i-1]
    meta = Node("meta", 1001, nodes[-1])
    # Should be True for nodes[-1] as parent
    codeflash_output = _is_metadata_of(meta, nodes[-1]) # 823ns -> 1.09μs (24.5% slower)
    # Should be False for parent as parent_group (not direct parent)
    codeflash_output = _is_metadata_of(meta, parent) # 49.1μs -> 48.5μs (1.28% faster)

def test_large_scale_many_levels_false():
    # Create a chain of 1000 nodes, last is "not_meta"
    nodes = [Node(str(i), i+1) for i in range(999)]
    parent = Node("parent", 1)
    nodes[0]._v_parent = parent
    for i in range(1, 999):
        nodes[i]._v_parent = nodes[i-1]
    not_meta = Node("not_meta", 1001, nodes[-1])
    codeflash_output = _is_metadata_of(not_meta, nodes[-1]) # 52.0μs -> 1.01μs (5050% faster)

def test_large_scale_many_meta_nodes():
    # Create 500 meta nodes, each as direct child of a different parent
    parents = [Node(f"parent{i}", 1) for i in range(500)]
    metas = [Node("meta", 2, parents[i]) for i in range(500)]
    for i in range(500):
        codeflash_output = _is_metadata_of(metas[i], parents[i]) # 145μs -> 142μs (2.31% faster)
        # Should not match other parents
        if i > 0:
            codeflash_output = _is_metadata_of(metas[i], parents[i-1])

def test_large_scale_all_false():
    # Create 500 nodes, none named "meta"
    parents = [Node(f"parent{i}", 1) for i in range(500)]
    children = [Node(f"child{i}", 2, parents[i]) for i in range(500)]
    for i in range(500):
        codeflash_output = _is_metadata_of(children[i], parents[i]) # 155μs -> 138μs (12.1% faster)

def test_large_scale_meta_at_various_depths():
    # Create meta nodes at varying depths, only direct meta child should match
    root = Node("root", 1)
    meta2 = Node("meta", 2, root)
    meta3 = Node("meta", 3, meta2)
    meta4 = Node("meta", 4, meta3)
    # Only meta2 is direct child of root
    codeflash_output = _is_metadata_of(meta2, root) # 707ns -> 904ns (21.8% slower)
    codeflash_output = _is_metadata_of(meta3, root) # 593ns -> 608ns (2.47% slower)
    codeflash_output = _is_metadata_of(meta4, root) # 477ns -> 486ns (1.85% slower)

def test_large_scale_meta_with_circular_parent_reference():
    # Create a circular reference, should not infinite loop
    parent = Node("parent", 1)
    meta = Node("meta", 2, parent)
    # Make meta its own parent (cycle)
    meta._v_parent = meta
    # Should not match, and should not infinite loop
    codeflash_output = _is_metadata_of(meta, parent)

def test_large_scale_meta_with_deep_tree():
    # Build a tree with multiple branches, only direct meta child matches
    root = Node("root", 1)
    branches = [Node(f"branch{i}", 2, root) for i in range(10)]
    metas = [Node("meta", 3, branch) for branch in branches]
    for i in range(10):
        codeflash_output = _is_metadata_of(metas[i], branches[i]) # 3.42μs -> 3.55μs (3.60% slower)
        codeflash_output = _is_metadata_of(metas[i], root)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-_is_metadata_of-mhc1huqt and push.

Codeflash

The optimized code achieves a 10% speedup through one key structural change: **early termination when the parent matches but the name is wrong**.

**What changed:**
- Split the compound condition `if parent == parent_group and current._v_name == "meta"` into nested conditions
- Added an explicit `return False` when `parent == parent_group` but `current._v_name != "meta"`

**Why this improves performance:**
In the original code, when a matching parent is found but the node name isn't "meta", the function continues traversing up the tree unnecessarily. The optimized version immediately returns `False` in this case, avoiding further tree traversal.

**Performance characteristics from test results:**
- **Best gains** (20-50% faster): Cases where nodes have the correct parent relationship but wrong names (e.g., "data", "notmeta", etc.)
- **Neutral/slight regression** (1-15% slower): Cases where the node name is "meta" and should return `True`, due to the additional nested condition check
- **Good gains on large scale** (1-12% faster): Deep tree traversals where early termination saves significant work

The optimization is particularly effective for workloads with many non-metadata nodes that share parent relationships with the target parent group, as it eliminates unnecessary tree walking once the parent match is confirmed but the name check fails.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 29, 2025 13:37
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Oct 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant