-
Notifications
You must be signed in to change notification settings - Fork 217
feat(gfql): Implement mark() operation for pattern annotation (#755) #815
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
+1,307
−7
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
lmeyerov
added a commit
that referenced
this pull request
Oct 19, 2025
- Add type annotations for id_col union type - Handle None cases for _node, _source, _destination bindings - Add type narrowing assertions for node vs edge cases - Use type: ignore for runtime-added gfql method Fixes python-lint-types CI failures on PR #815. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Add mark mode for GFQL that annotates nodes/edges with boolean columns without filtering, enabling multi-stage pattern detection and visualization of matches. Implementation: - Add mark() method to ComputeMixin with full validation - Register mark in call_safelist.py for call('mark') support - Handle GFQL execution (Chain/list/JSON) with proper type coercion - Create boolean columns: True for matches, False for non-matches - Support both node and edge marking inferred from GFQL pattern - Comprehensive parameter validation and error messages Design: - User-provided column names (error on collision) - Marks accumulate across let() bindings - Validated against internal column name conflicts Files: - graphistry/compute/mark.py: Core implementation - graphistry/compute/ComputeMixin.py: Method wrapper - graphistry/compute/gfql/call_safelist.py: Safelist registration - docs/design/mark-mode-design.md: Full design document Example usage: g.mark(gfql=[n({'type': 'person'})], name='is_person') g.gfql(call('mark', {'gfql': [n()], 'name': 'hit'})) Related: #755
Add 24 unit tests covering all mark() functionality: - Safelist registration and parameter validation - Basic node and edge marking - Boolean column creation (True/False values) - All-match and no-match scenarios - Parameter validation (type checking, empty values, collisions) - Internal column name protection - Multiple mark accumulation - call('mark') execution - JSON GFQL deserialization for remote execution - let() DAG composition with marks - Chain integration - Edge cases (no nodes, no edges) Fixes for test failures: - Added empty gfql validation before chain access - Convert ValueError to GFQLSchemaError for internal column conflicts - Updated test assertions to match actual error messages All tests passing: 24/24 Related: #755
Add more comprehensive assertions to call() tests: - Verify exact boolean values for specific nodes (not just counts) - Verify other columns are preserved - Verify edges are unchanged (pd.testing.assert_frame_equal) - Compare JSON vs list GFQL forms produce identical results This addresses testing gap where call() tests were less rigorous than direct mark() tests. Now we have high confidence that call('mark') returns the exact expected graph structure. All tests still passing: 24/24
Add comprehensive documentation for the new mark() call operation: - Parameter table with types and requirements - Five detailed examples showing various use cases - Use cases for fraud detection, security, social network analysis - Comparison table: mark vs filter operations - Schema effects and implementation notes - Integration with let(), visualization, and multi-mark workflows Placed in 'Filtering and Transformation Methods' section after filter_edges_by_dict and before hop. Related: #755
- Add type annotations for id_col union type - Handle None cases for _node, _source, _destination bindings - Add type narrowing assertions for node vs edge cases - Use type: ignore for runtime-added gfql method Fixes python-lint-types CI failures on PR #815. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Use noqa: E712 comments for pandas boolean comparisons where == True/False is necessary for testing exact values (numpy bool != Python bool identity). Also simplified some assertions to use truthiness where appropriate. Fixes CI python-lint-types failures. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Move docs/design/mark-mode-design.md to plans/feat-755-mark-mode/design.md - Design docs should be in plans/, not committed to master
- Keep essential intro, parameters, and schema info - Reduce from 5 examples to 2 focused examples - Simplify use cases to single line - Remove verbose comparison table - Target: ~50-70 lines (achieved: 58 lines)
- gfql: Validate list contains dicts with 'type' key (AST objects) - gfql: Validate dict contains 'chain' key (Chain JSON) - engine: Validate value is in valid engine set (pandas/cudf/dask/dask_cudf/auto) - Prevents malformed parameters from reaching execution
e788de3
to
8464b81
Compare
- Change name parameter from required to optional (defaults to None) - Generate default names: 'is_matched_node' or 'is_matched_edge' - Update safelist to accept list of AST objects or Chain JSON - Update tests to verify default name generation - Update documentation to reflect optional parameter Addresses user feedback: 'let's not make 'name' required, there's probably some reasonable default name we can use'
- Add is_engine_param() validator accepting both EngineAbstract enum and strings - Update all operations with engine parameters to use new validator: * get_degrees, mark, materialize_nodes, hop, fa2_layout * group_in_a_box_layout, hypergraph - Maintains backward compatibility with string literals - Note: umap 'engine' param uses different values (umap backend), left as-is Addresses user feedback: 'engine isn't just string, we probably have the same issue in others too? should accept EngineAbstract or str. Audit other operations for the same issue.'
- Fix node matching: Convert matched IDs to list for cross-engine isin() - Fix edge matching: Replace tuple-based with merge-based approach - Add index preservation: __row_idx__ + sort_values + reset_index - Add test_mark_cudf.py with 9 comprehensive GPU tests - Tests pass: 26 pandas + 9 cuDF = 35 total - Verified with NVIDIA GeForce RTX 3080 Ti via Docker Related: #755, PR #815
Plans directory is gitignored and should not be tracked. File remains in local plans/ directory but will not be pushed to remote. This fixes the mistake from commit where the file was incorrectly added to git tracking despite plans/ being in .gitignore.
revisit later, see #820 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Implements mark mode for GFQL that annotates nodes/edges with boolean columns without filtering, enabling multi-stage pattern detection and visualization of match vs non-match entities.
Closes #755
Implementation
Core Features
g.mark(gfql=[n({'type': 'person'})], name='is_person')
g.gfql(call('mark', {'gfql': [n()], 'name': 'hit'}))
Files Changed
graphistry/compute/mark.py
: Core implementation with full validationgraphistry/compute/ComputeMixin.py
: Method wrapper integrationgraphistry/compute/gfql/call_safelist.py
: Safelist registration forcall('mark')
docs/source/gfql/builtin_calls.rst
: Comprehensive documentationdocs/design/mark-mode-design.md
: Full design documentgraphistry/tests/compute/test_mark.py
: 24 comprehensive testsExamples
Basic Marking
Accumulating Marks
In let() DAG
With Visualization
Testing
24 comprehensive unit tests covering:
call('mark')
execution with exact graph verificationlet()
DAG composition with marksAll tests passing: 24/24 ✅
Enhanced test rigor for
call('mark')
:pd.testing.assert_frame_equal
)Design Decisions
Documentation
docs/design/mark-mode-design.md
docs/source/gfql/builtin_calls.rst
Commits
49b6617e
- feat(gfql): Implement mark() operatione99cc957
- test(gfql): Add comprehensive tests for mark()4d2babac
- test(gfql): Enhance call('mark') test rigor987c2d05
- docs(gfql): Add mark() to builtin calls referenceChecklist