-
Notifications
You must be signed in to change notification settings - Fork 217
Open
Description
Mark API – Name-First Spec (Proposal)
Motivation
Earlier GFQL execution assumed matcher names were unique per entity type. Reusing the same name=
caused pandas/cuDF to auto-suffix columns, so the engine later looked up the original name and raised KeyError
. Issue #818 tracks the underlying bug, and this proposal builds on its resolution to deliver a simpler, name-driven mark experience.
Summary
- Name-first defaults: If your GFQL chain includes
name=
annotations,g.mark()
returns columns for those names automatically. Duplicate names resolve according to a conflict policy (any
,error
,suffix
). - Non-named chains: A new
mode
parameter controls what gets marked when no names are provided (auto/project/first/last/all). - Projection remains:
project
still narrows the return columns (with lists or structured entries) after the chosen mode generates them. - Helper utilities:
validate_mark_names()
andlist_available_marks()
help template tooling lint or discover names before execution.
API Highlights
-
g.mark(...)
accepts:gfql=list/Chain
: GFQL operations (n()
,e_forward()
, ...).mode='auto'|'project'|'first'|'last'|'all'
: default'auto'
uses names/project; otherwise returns both node+edge flags.'project'
is stricter (fails if nothing is named/projected).'first'
/'last'
mark only the first/last hops (nodes and their adjacent edges).'all'
marks the entire matched graph.project
: optional dict/list to keep only specific columns once mode has generated them.name_conflicts
(a.k.a.conflict_policy
):'any'
(default),'error'
,'suffix'
. Exposed ongfql()
/chain()
/gfql_remote()
and forwarded bymark()
.- Standard knobs (engine overrides, audit metadata) remain but aren’t required for normal use.
-
Helper functions:
validate_mark_names(chain, policy)
– lint name collisions before execution.list_available_marks(chain)
– list the names a chain would emit.
Implementation Notes
name_conflicts
logic lives in the GFQL executor (Issue GFQL named matcher collision causes runtime KeyError #818).mark()
simply forwards the policy.mode
defaults to'auto'
. With no names/project,'auto'
returnsmark_nodes
andmark_edges
.'project'
ensures you never accidentally mark everything without intent.mode='first'
and'last'
require wavefront bookkeeping (marking the first/last hop). These are lower priority and can ship later.- Features depending on named matchers (conflict policy, name-driven projection) are blocked until Issue GFQL named matcher collision causes runtime KeyError #818 is resolved. Boolean projection still works without names.
Examples
-
Name-driven defaults
vip_chain = [ n({'account_type': 'VIP'}, name='vip_originator'), e_forward({'amount': {'$gte': 10_000}}), n({'country': 'Offshore'}, name='vip_originator') ] g.mark(gfql=vip_chain) # yields vip_originator column (OR semantics)
-
Suffix duplicates
breach_chain = [...] g.mark(gfql=breach_chain, name_conflicts='suffix') # breach_nodes, breach_nodes_1, ...
-
Project specific outputs
g.mark(gfql=invoice_chain, project={'nodes': ['supplier_seed'], 'edges': ['invoice_edge']})
-
No names (mark everything)
g.mark(gfql=anonymous_chain, mode='all') # returns mark_nodes + mark_edges columns
-
Seed-only
g.mark(gfql=follow_chain, mode='first') # marks only the starting nodes/edges
Priorities
- Implement
'auto'
/'project'
and boolean projection first (no name semantics required). - Once Issue GFQL named matcher collision causes runtime KeyError #818 lands, enable
'suffix'
conflicts and name-driven projection. 'first'
/'last'
modes may come later due to additional bookkeeping.
References
- Naming conflict handling: GFQL named matcher collision causes runtime KeyError #818
- Working spec and examples:
plans/feat-755-mark-mode-codex/final_recommendations.md
&representative_tasks.md
in repo.
Metadata
Metadata
Assignees
Labels
No labels