Skip to content

⚡️ Speed up function find_last_node by 13,663% #11

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

codeflash-ai[bot]
Copy link

@codeflash-ai codeflash-ai bot commented May 15, 2025

📄 13,663% (136.63x) speedup for find_last_node in src/dsa/nodes.py

⏱️ Runtime : 157 milliseconds 1.14 millisecond (best of 412 runs)

📝 Explanation and details

Here is a faster version of the given function.
Optimization:

  • Instead of checking for each node if any edge has it as a source (O(N * E)), we first collect all the used sources in a set (O(E)), then check each node's id against this set (O(N)).
  • This reduces the typical time complexity from O(N * E) to O(E + N).

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 43 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests Details
import pytest  # used for our unit tests
from src.dsa.nodes import find_last_node

# unit tests

# -------------------------------
# Basic Test Cases
# -------------------------------

def test_single_node_no_edges():
    # One node, no edges; node should be returned as last node
    nodes = [{"id": "A"}]
    edges = []
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output

def test_two_nodes_one_edge():
    # Two nodes, one edge from A to B; B should be last node
    nodes = [{"id": "A"}, {"id": "B"}]
    edges = [{"source": "A", "target": "B"}]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output

def test_three_nodes_linear_chain():
    # Three nodes, linear chain A->B->C; C should be last node
    nodes = [{"id": "A"}, {"id": "B"}, {"id": "C"}]
    edges = [{"source": "A", "target": "B"}, {"source": "B", "target": "C"}]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output

def test_multiple_last_nodes_returns_first():
    # Two nodes, no edges; both are last nodes, should return first in list
    nodes = [{"id": "A"}, {"id": "B"}]
    edges = []
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output

def test_branching_graph():
    # A->B, A->C; both B and C are last nodes, should return first in list
    nodes = [{"id": "A"}, {"id": "B"}, {"id": "C"}]
    edges = [{"source": "A", "target": "B"}, {"source": "A", "target": "C"}]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output

# -------------------------------
# Edge Test Cases
# -------------------------------

def test_empty_nodes_and_edges():
    # No nodes, no edges; should return None
    nodes = []
    edges = []
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output

def test_edges_but_no_nodes():
    # Edges exist but nodes list is empty; should return None
    nodes = []
    edges = [{"source": "A", "target": "B"}]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output

def test_edges_with_unknown_nodes():
    # Edges refer to nodes not in the nodes list; should return None
    nodes = []
    edges = [{"source": "X", "target": "Y"}]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output

def test_self_loop():
    # Node with a self-loop; node should not be considered last
    nodes = [{"id": "A"}]
    edges = [{"source": "A", "target": "A"}]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output

def test_circular_graph():
    # A->B, B->C, C->A; no last node exists
    nodes = [{"id": "A"}, {"id": "B"}, {"id": "C"}]
    edges = [{"source": "A", "target": "B"}, {"source": "B", "target": "C"}, {"source": "C", "target": "A"}]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output

def test_multiple_edges_to_last_node():
    # A->C, B->C; C is last node
    nodes = [{"id": "A"}, {"id": "B"}, {"id": "C"}]
    edges = [{"source": "A", "target": "C"}, {"source": "B", "target": "C"}]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output

def test_node_with_no_incoming_or_outgoing_edges():
    # Node D is isolated, should be returned as last node (first in list)
    nodes = [{"id": "A"}, {"id": "B"}, {"id": "C"}, {"id": "D"}]
    edges = [{"source": "A", "target": "B"}, {"source": "B", "target": "C"}]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output

def test_duplicate_node_ids():
    # Duplicated node ids; function should return the first occurrence
    nodes = [{"id": "A"}, {"id": "A"}]
    edges = []
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output

def test_edge_with_nonexistent_source():
    # Edge with source not in nodes; should not affect result
    nodes = [{"id": "A"}, {"id": "B"}]
    edges = [{"source": "Z", "target": "A"}]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output
    
def test_edge_with_nonexistent_target():
    # Edge with target not in nodes; should not affect result
    nodes = [{"id": "A"}, {"id": "B"}]
    edges = [{"source": "A", "target": "Z"}]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output

def test_nodes_with_additional_fields():
    # Nodes have extra fields; function should ignore them
    nodes = [{"id": "A", "value": 1}, {"id": "B", "value": 2}]
    edges = [{"source": "A", "target": "B"}]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output

# -------------------------------
# Large Scale Test Cases
# -------------------------------

def test_large_linear_chain():
    # 1000 nodes in a linear chain; last node should be last in list
    N = 1000
    nodes = [{"id": str(i)} for i in range(N)]
    edges = [{"source": str(i), "target": str(i+1)} for i in range(N-1)]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output

def test_large_branching_graph():
    # 500 nodes, each with an edge to a unique last node (500 more nodes)
    N = 500
    nodes = [{"id": f"A{i}"} for i in range(N)] + [{"id": f"B{i}"} for i in range(N)]
    edges = [{"source": f"A{i}", "target": f"B{i}"} for i in range(N)]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output

def test_large_graph_with_isolated_node():
    # 999 nodes in a chain, 1 isolated node at start
    N = 1000
    nodes = [{"id": "X"}] + [{"id": str(i)} for i in range(N-1)]
    edges = [{"source": str(i), "target": str(i+1)} for i in range(1, N-2)]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output

def test_large_graph_all_connected():
    # 1000 nodes, each with outgoing edge to next (cycle); no last node
    N = 1000
    nodes = [{"id": str(i)} for i in range(N)]
    edges = [{"source": str(i), "target": str((i+1)%N)} for i in range(N)]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output

def test_large_graph_multiple_last_nodes():
    # 500 nodes, each with no outgoing edges (no edges at all)
    N = 500
    nodes = [{"id": str(i)} for i in range(N)]
    edges = []
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output

# -------------------------------
# Mutation Testing Guards
# -------------------------------

def test_mutation_guard_returns_none_if_no_last_node():
    # If all nodes have outgoing edges, must return None
    nodes = [{"id": "A"}, {"id": "B"}]
    edges = [{"source": "A", "target": "B"}, {"source": "B", "target": "A"}]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output

def test_mutation_guard_returns_first_last_node():
    # If multiple last nodes, must return first in list
    nodes = [{"id": "X"}, {"id": "Y"}, {"id": "Z"}]
    edges = []
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

import pytest  # used for our unit tests
from src.dsa.nodes import find_last_node

# unit tests

# -------------------------
# Basic Test Cases
# -------------------------

def test_single_node_no_edges():
    # One node, no edges: should return the node itself
    nodes = [{"id": "A"}]
    edges = []
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output

def test_two_nodes_one_edge():
    # Two nodes, one edge from A to B: should return B (no outgoing edges)
    nodes = [{"id": "A"}, {"id": "B"}]
    edges = [{"source": "A", "target": "B"}]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output

def test_three_nodes_linear_chain():
    # A -> B -> C, should return C
    nodes = [{"id": "A"}, {"id": "B"}, {"id": "C"}]
    edges = [{"source": "A", "target": "B"}, {"source": "B", "target": "C"}]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output

def test_multiple_terminal_nodes():
    # A -> B, C is isolated, D is isolated: should return first found last node (C or D)
    nodes = [{"id": "A"}, {"id": "B"}, {"id": "C"}, {"id": "D"}]
    edges = [{"source": "A", "target": "B"}]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output
    # Actually, B, C, D all have no outgoing edges. But since nodes are checked in order,
    # the first one found is B (since A has outgoing edge, B doesn't, C doesn't, D doesn't).
    # So the function will return B.

def test_disconnected_nodes():
    # No edges, multiple nodes: should return first node in list
    nodes = [{"id": "X"}, {"id": "Y"}, {"id": "Z"}]
    edges = []
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output

# -------------------------
# Edge Test Cases
# -------------------------

def test_empty_nodes_and_edges():
    # No nodes, no edges: should return None
    nodes = []
    edges = []
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output

def test_edges_but_no_nodes():
    # Edges present, but no nodes: should return None
    nodes = []
    edges = [{"source": "A", "target": "B"}]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output

def test_cycle_graph():
    # A -> B -> C -> A (cycle): all nodes have outgoing edges, should return None
    nodes = [{"id": "A"}, {"id": "B"}, {"id": "C"}]
    edges = [
        {"source": "A", "target": "B"},
        {"source": "B", "target": "C"},
        {"source": "C", "target": "A"}
    ]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output

def test_self_loop_node():
    # Node with a self-loop should not be considered last node
    nodes = [{"id": "A"}, {"id": "B"}]
    edges = [{"source": "A", "target": "A"}]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output

def test_multiple_edges_from_one_node():
    # A -> B, A -> C, B -> D; D should be last node
    nodes = [{"id": "A"}, {"id": "B"}, {"id": "C"}, {"id": "D"}]
    edges = [
        {"source": "A", "target": "B"},
        {"source": "A", "target": "C"},
        {"source": "B", "target": "D"}
    ]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output

def test_node_with_incoming_but_no_outgoing_edges():
    # A -> B, C -> B; B has only incoming, should be returned
    nodes = [{"id": "A"}, {"id": "B"}, {"id": "C"}]
    edges = [
        {"source": "A", "target": "B"},
        {"source": "C", "target": "B"}
    ]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output

def test_edges_with_nonexistent_nodes():
    # Edges refer to nodes not in nodes list; should ignore those edges for missing nodes
    nodes = [{"id": "A"}, {"id": "B"}]
    edges = [{"source": "A", "target": "X"}, {"source": "Y", "target": "B"}]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output

def test_duplicate_node_ids():
    # Duplicate node IDs: should return first node with no outgoing edges
    nodes = [{"id": "A"}, {"id": "A"}, {"id": "B"}]
    edges = [{"source": "A", "target": "B"}]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output

def test_node_with_non_string_id():
    # Node IDs are not strings
    nodes = [{"id": 1}, {"id": 2}]
    edges = [{"source": 1, "target": 2}]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output

def test_edge_case_large_ids():
    # Node IDs are large numbers
    nodes = [{"id": 9999999}, {"id": 8888888}]
    edges = [{"source": 9999999, "target": 8888888}]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output

# -------------------------
# Large Scale Test Cases
# -------------------------

def test_large_linear_chain():
    # 1000 nodes in a chain: 0->1->2->...->999, should return node 999
    nodes = [{"id": str(i)} for i in range(1000)]
    edges = [{"source": str(i), "target": str(i+1)} for i in range(999)]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output

def test_large_star_graph():
    # Node 0 points to all others, all others have no outgoing edges
    nodes = [{"id": str(i)} for i in range(1000)]
    edges = [{"source": "0", "target": str(i)} for i in range(1, 1000)]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output

def test_large_fully_connected_graph():
    # Every node points to every other node (no last node)
    nodes = [{"id": str(i)} for i in range(100)]
    edges = []
    for i in range(100):
        for j in range(100):
            if i != j:
                edges.append({"source": str(i), "target": str(j)})
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output

def test_large_disconnected_nodes():
    # 1000 nodes, no edges: should return first node
    nodes = [{"id": str(i)} for i in range(1000)]
    edges = []
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output

def test_large_graph_multiple_last_nodes():
    # 1000 nodes, 500 edges from 0-499 to 500-999 (each i->i+500)
    nodes = [{"id": str(i)} for i in range(1000)]
    edges = [{"source": str(i), "target": str(i+500)} for i in range(500)]
    codeflash_output = find_last_node(nodes, edges); result = codeflash_output
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-find_last_node-map37u02 and push.

Codeflash

Here is a faster version of the given function.  
Optimization:  
- Instead of checking for each node if any edge has it as a source (O(N * E)), we first collect all the used sources in a set (O(E)), then check each node's id against this set (O(N)).  
- This reduces the typical time complexity from O(N * E) to O(E + N).
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label May 15, 2025
@codeflash-ai codeflash-ai bot requested a review from KRRT7 May 15, 2025 08:08
@codeflash-ai codeflash-ai bot deleted the codeflash/optimize-find_last_node-map37u02 branch May 20, 2025 05:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⚡️ codeflash Optimization PR opened by Codeflash AI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant