Skip to content

Proposal: Name-first mark() API #820

@lmeyerov

Description

@lmeyerov

Mark API – Name-First Spec (Proposal)

Motivation

Earlier GFQL execution assumed matcher names were unique per entity type. Reusing the same name= caused pandas/cuDF to auto-suffix columns, so the engine later looked up the original name and raised KeyError. Issue #818 tracks the underlying bug, and this proposal builds on its resolution to deliver a simpler, name-driven mark experience.

Summary

  1. Name-first defaults: If your GFQL chain includes name= annotations, g.mark() returns columns for those names automatically. Duplicate names resolve according to a conflict policy (any, error, suffix).
  2. Non-named chains: A new mode parameter controls what gets marked when no names are provided (auto/project/first/last/all).
  3. Projection remains: project still narrows the return columns (with lists or structured entries) after the chosen mode generates them.
  4. Helper utilities: validate_mark_names() and list_available_marks() help template tooling lint or discover names before execution.

API Highlights

  • g.mark(...) accepts:

    • gfql=list/Chain: GFQL operations (n(), e_forward(), ...).
    • mode='auto'|'project'|'first'|'last'|'all': default 'auto' uses names/project; otherwise returns both node+edge flags. 'project' is stricter (fails if nothing is named/projected). 'first'/'last' mark only the first/last hops (nodes and their adjacent edges). 'all' marks the entire matched graph.
    • project: optional dict/list to keep only specific columns once mode has generated them.
    • name_conflicts (a.k.a. conflict_policy): 'any' (default), 'error', 'suffix'. Exposed on gfql()/chain()/gfql_remote() and forwarded by mark().
    • Standard knobs (engine overrides, audit metadata) remain but aren’t required for normal use.
  • Helper functions:

    • validate_mark_names(chain, policy) – lint name collisions before execution.
    • list_available_marks(chain) – list the names a chain would emit.

Implementation Notes

  • name_conflicts logic lives in the GFQL executor (Issue GFQL named matcher collision causes runtime KeyError #818). mark() simply forwards the policy.
  • mode defaults to 'auto'. With no names/project, 'auto' returns mark_nodes and mark_edges. 'project' ensures you never accidentally mark everything without intent.
  • mode='first' and 'last' require wavefront bookkeeping (marking the first/last hop). These are lower priority and can ship later.
  • Features depending on named matchers (conflict policy, name-driven projection) are blocked until Issue GFQL named matcher collision causes runtime KeyError #818 is resolved. Boolean projection still works without names.

Examples

  1. Name-driven defaults

    vip_chain = [
        n({'account_type': 'VIP'}, name='vip_originator'),
        e_forward({'amount': {'$gte': 10_000}}),
        n({'country': 'Offshore'}, name='vip_originator')
    ]
    g.mark(gfql=vip_chain)  # yields vip_originator column (OR semantics)
  2. Suffix duplicates

    breach_chain = [...]
    g.mark(gfql=breach_chain, name_conflicts='suffix')  # breach_nodes, breach_nodes_1, ...
  3. Project specific outputs

    g.mark(gfql=invoice_chain,
           project={'nodes': ['supplier_seed'], 'edges': ['invoice_edge']})
  4. No names (mark everything)

    g.mark(gfql=anonymous_chain, mode='all')  # returns mark_nodes + mark_edges columns
  5. Seed-only

    g.mark(gfql=follow_chain, mode='first')  # marks only the starting nodes/edges

Priorities

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions