GFQL named matcher collision causes runtime KeyError

# GFQL Named Matcher Collision Causes Runtime KeyError

## Summary
GFQL currently assumes matcher names are unique per entity type. When two node matchers (or two edge matchers) reuse the same `name=...`, execution reaches `combine_steps()` where pandas/cuDF suffix duplicate columns (`dup_x`, `dup_y`). The code then looks up the original name (`out_df[op._name]`) and raises a raw `KeyError`. The same happens if the graph already has a column with that name before the matcher runs.

## Minimal Repros
Run with `PYTHONPATH=.` so the local tree is used.

```python
import pandas as pd
from graphistry.tests.test_compute import CGFull
from graphistry import n, e_forward

g = (
    CGFull()
    .nodes(pd.DataFrame({'node': [0, 1, 2, 3]}))
    .edges(pd.DataFrame({'source': [0, 1, 2], 'target': [1, 2, 3]}))
    .bind(node='node', source='source', destination='target')
)

# 1. Node/node duplicate -> KeyError
chain_node_node = [
    n({'node': 0}, name='dup'),
    e_forward(),
    n({'node': 1}, name='dup'),
]
g.gfql(chain_node_node)

# 2. Edge/edge duplicate -> KeyError
chain_edge_edge = [
    n({'node': 0}),
    e_forward(name='dup'),
    n({'node': 1}),
    e_forward(name='dup'),
    n({'node': 2}),
]
g.gfql(chain_edge_edge)

# 3. Existing column + matcher name -> KeyError
chain_existing = [
    n({'node': 0}),
    e_forward(),
    n({'node': 1}, name='node'),  # graph already has “node” column
]
g.gfql(chain_existing)

# FYI node vs edge reuse is fine (different DataFrames)
chain_node_edge = [
    n({'node': 0}, name='dup'),
    e_forward(name='dup'),
    n({'node': 1}),
]
res = g.gfql(chain_node_edge)
print(res._nodes.columns)  # ['node', 'dup']
print(res._edges.columns)  # ['dup', 'source', 'target']
```

Stack trace excerpt:
```
File ".../graphistry/compute/chain.py", line 221, in combine_steps
    s = out_df[op._name]
KeyError: 'dup'
```

## Expected vs Actual
- **Expected**: Duplicate names should be handled deterministically—either accepted via an explicit policy (e.g. OR, suffix) or rejected during validation with a descriptive error. Pre-existing column collisions should follow the same rules.
- **Actual**: pandas/cuDF renames duplicates, `combine_steps()` still looks up the original name, and we get an unhelpful `KeyError` at runtime.

## Proposed Direction
Based on recent exploration:
1. Add a `name_conflicts` (aka `conflict_policy`) parameter to `gfql()` / `chain()` (and forward it from `mark()`). Suggested options:
   - `'any'` (default): OR together duplicate matchers of the same name.
   - `'error'`: fail fast before execution with a `GFQLSchemaError` pointing to the conflicting operations / existing columns.
   - `'suffix'`: allow duplicates by auto-suffixing (`dup`, `dup_1`, ...), similar to DataFrame behavior but controlled and logged.
2. The default `'any'` aligns with analyst expectations (“if any matcher fires, mark it”). `'error'` keeps curated templates strict, while `'suffix'` supports reporting workflows that want separate columns.
3. Even with the policy in place, continue validating pre-existing column collisions up front so the error is actionable.

## Minimum Fix
Regardless of the policy chosen, guard against duplicate names before `combine_steps()` tries to access `out_df[op._name]`. Catching the collision early (or auto-resolving according to policy) avoids the raw KeyError and makes the behavior predictable.

Handling the pre-existing column case is important—users often reuse names like `node` or `flag`. Without this policy, even a single named matcher can crash when that column already exists.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GFQL named matcher collision causes runtime KeyError #818

GFQL Named Matcher Collision Causes Runtime KeyError

Summary

Minimal Repros

Expected vs Actual

Proposed Direction

Minimum Fix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GFQL named matcher collision causes runtime KeyError #818

Description

GFQL Named Matcher Collision Causes Runtime KeyError

Summary

Minimal Repros

Expected vs Actual

Proposed Direction

Minimum Fix

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions