-
Notifications
You must be signed in to change notification settings - Fork 215
Fix spanner creds issues: fixes: #634, #644 #646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
lmeyerov
approved these changes
Jan 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
lgtm -- merge & push tag?
lmeyerov
added a commit
that referenced
this pull request
Jul 17, 2025
* feat(skrub): upgrade from dirty_cat * fix(umap): transform drop y from X * infra(ci): gha does not support py3.14 * infra(ci): remove py3.13 as sklearn gha not ready * infra(gha py): remove 3.12 bc torch < 2 * infra(ci): typecheck use appropriate py version * infra(py): unpin torch for py3.12 * garden(featurize): types * refactor(FastEncoder): non-null y * wip(transform): match batch X y to train X y * fix(feat): backout incorrect feat standaridzations * infra(typecheck): handle py x.y.z version formats * infra(umap testers): full pytest overriding * refactor(feat test): split * refactor(transform param names): trained -> fit * fix(featurize): ydf * feat(transform): skrub preconditioning * feat(tests): do not ignore warnings * fix(feat utils): python 3.9 typing * fix(feat utils): python 3.9 typing * fix(umap): scipy 1.15 breakage? * fix(umap): scipy 1.14 breakage? * infra(scikit): increase minimum version for umap learn * fix(umap): require higher scipy * infra(umap): py3.10+ * fix(ci): remove false constraint * fix(tests): numpy 2 * fix(feat): edge cases * fix(feat): edge cases * infra(ci): try reenabling py 3.9 for umap * infra(ci): try reenabling py3.8 for umap, ai * infra(deps): umaps use of skrub means python 3.9+ * fix(test): cuml umap engine arg pos * infra(gpu ci): wip * infra(setup.py): rapids * infra(ci): swap dc with skrub * refactor(print): to logger * fix(test): plugin tests conditional on deps * fix(dgl): pass gpu tests * refactor(ModelDict): move to models * refactor(graph kind): external model * refactor(umap models): factor out * refactor(interfaces): umap field names, typed interfaces * garden(deadcode): remove * fix(feat): gpu support * fix(umap): gpu * refactor(dbscan): new interfaces * fix(dbscan): gpu mode * refactor(umap tests): decouple * fix(gpu mode): more ai paths * infra(dgl): bin tester * infra(ci): dgl * fix(gpu runner): do not suppress feat tests * infra(ai tester): py3.10 * infra(gpu ci): update - wip * security(gpu ci): require admin * docs(table_to_graph_three_ways): add tutorial * docs(changelog) * fix(docs): tutorial typo * docs(plausible): privacy-preserving docs analytics * wip(plausible) * wip(plausible) * wip(plausible) * fix(ipynb): warnings * docs(changelog) * add support for google spanner graph (#622) * (feature): new support for Google Spanner Graph * renamed spannergraph file * changed dir for spannergraph.py, added module in setup.py * added lazy imports * removed import from Plotterbase.py * fix pydocs * remove database_clients dir, moved to plugins/ * various changes to pass in spanner config to register() * added debugging * added demo notebook, debug output * minor changes * various changes for error handling, imports and pydocs * fixed register and uncomment gcloud * fixed typo in register * added spanner_query_to_df and other fixes from PR comments * updated notebook with more examples * fix lint issues * fix linting errors * fix linting errors * fix more lint issue * fix more lint issue * fix more lint issue * fix more lint issue * updated notebook with CTA and other docs * fix for readthedocs markdown * updates from PR comments * removed None assignment for _spannergraph - per PR comments * changes to pass Plottable dynamically to SpannerGraph.gql_to_g * various PR review changes * fix lint error and add plot output back to notebook * fix lint error and add plot output back to notebook * fix lint error about blank line at end of file * updated Changelog.md, notebook plugins for readthedocs, removed copyright (#641) * fix notebook html and protocol for plot() * fix html for Spanner notebook (#642) * docs: html, image links, markdown in notebook * docs: sql codeblocks breaking sphinx * docs: notebook plot, switching to https for hub * docs: update CHANGELOG.md * Fix spanner creds issues: fixes: #634, #644 (#646) * fix: credential_file logic when not defined * fix: spanner_cofig get logic and error when not defined * chore: update to v0.35.10 in CHANGELOG.md w/ comments * fix(cudf): more skrub upgrade fixes * fix(cugraph): breaking - handle 26.10 * docs(changelog) * feat(plottable): add "upload_url_opts" option to plot * feat(plottable): add "upload_url_opts" option to upload * Dev/fix gfql abstraction (#657) * docs(claude): starting point * infra(xdist): add * fix(typing): update tqdm type stubs reference and configuration * refactor(compute): centralize SeriesT type definition for consistent typing * test(compute): add tests for column name conflicts in hop pattern matching * feat(compute): add support for node id column having same name as edge src/dst column This enhancement resolves the NotImplementedError that was raised when a node id column had the same name as the edge source or destination column. It uses a temporary column name mechanism to avoid conflicts during merge operations while preserving the original column names in the result. * docs(changelog): add GFQL hop pattern matching column name conflict enhancement * test(compute): add tests for column name conflicts in chain pattern matching * infra(CLAUDE.md): add * fix(compute): fix Python 3.8 type checking errors in hop.py * perf(test): add automatic parallelization with pytest-xdist when no args provided * garden(mypy.ini): remove unnecessary comment * perf(compute): optimize GFQL hop.py column name conflict handling - Remove redundant str() calls on TEMP_SRC_COL and TEMP_DST_COL variables - Reduce memory usage by avoiding unnecessary dataframe copying - Use pandas' functional programming style for cleaner operations 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * docs(CLAUDE.md): add performance guidelines section - Add concise performance guidelines for functional programming - Include DataFrame efficiency best practices - Add GFQL and engine optimization tips - Help prevent common performance issues 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * refactor(compute): centralize column conflict resolution in hop.py - Add prepare_merge_dataframe helper function to centralize logic - Refactor forward/reverse column conflict handling to use common function - Address PR review feedback to centralize similar code - Maintain proper type hinting for mypy compatibility 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * refactor(compute): reduce redundancy in hop.py target_wave_front handling - Hoist intermediate_target_wave_front calculation to be done once per iteration - Use the same calculated value for both forward and reverse directions - Maintain the same conditional logic structure for backward compatibility - All tests pass with the refactored implementation 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * refactor(compute): extract common hop direction processing logic in hop.py - Create process_hop_direction helper function to handle both forward and reverse directions - Simplify hop() function by using the helper to process each direction - Preserve existing behavior and all tests pass - Significantly reduces code duplication and improves maintainability 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * docs(changelog): update with recent hop.py performance improvements - Add performance section mentioning hop operation optimizations - Expand CLAUDE.md entry to include performance guidelines 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * docs(CLAUDE.md): add tip about removing Claude's comments - Add concise guideline for post-processing Claude-generated code - Emphasize removing explanatory comments in a separate step 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * fix(logging): replace f-strings with proper logger interpolation in hop.py - Fix F541 f-string missing placeholders linting errors - Use proper logger interpolation to avoid unnecessary string formatting - Add logging performance guidance to CLAUDE.md 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * docs(changelog) --------- Co-authored-by: Claude <noreply@anthropic.com> * Fix/issue 660 logging handler (#661) * fix(embed_utils): prevent global logging.StreamHandler.terminator modification Previously embed_utils.py was modifying the global logging.StreamHandler.terminator which affected ALL StreamHandlers across the entire Python process, breaking logging in other libraries. Now uses a custom _NoTerminatorHandler class that only affects the embed_utils logger without impacting global logging configuration. Fixes #660 * fix(feature_utils): handle missing encoder error properly Fixed _transform method that was returning None implicitly when encoder was not initialized. Now properly raises ValueError with clear message. Also fixed transform method to raise ValueError for invalid kind parameter instead of just logging and continuing. * fix(umap_utils): handle None values in transform_umap Properly handle cases where _y is None by creating empty DataFrame. Added assertions to ensure transform always returns non-None values as expected by type hints. * fix(text_utils): add type assertions for transform return values Add assertions to ensure transform() returns tuple type when return_graph=False, addressing mypy type checking issues. * chore(types): fix circular imports and add TYPE_CHECKING guards Add TYPE_CHECKING conditional imports to avoid circular dependencies in cluster.py, conditional.py, networks.py, outliers.py, and graphviz.py. Remove circular import from ModelDict.py. * revert(compute/collapse): remove unnecessary import changes Revert import changes that were not needed for fixing circular imports. * docs(changelog): add entry for logging handler fix Document fix for issue #660 where embed_utils.py was modifying global logging.StreamHandler.terminator. * chore(gitignore): add AI_PROGRESS directory Add AI_PROGRESS/ to gitignore for AI assistant working directories. * docs(claude): add conventional commits note and AI prompt templates Add note about using conventional commits for commit messages. Add comprehensive AI assistant prompt templates for development workflows: - Conventional commits template with safer git operations - Lint and type checking templates - Other development workflow templates * docs(changelog): update entries with commit hashes Add commit hashes to changelog entries for traceability. Correct description of logger changes (setup_logger utility, not TYPE_CHECKING). * chore(gitignore): add PLAN.md Add PLAN.md to gitignore for temporary AI planning files. * docs(changelog): format with proper GitHub links Add proper GitHub issue and commit links. Include PLAN.md in .gitignore entry. * docs(claude): simplify to point to ai_code_notes README Replace full guide content with single line pointing to the actual AI development guide location. * docs(changelog): add entry for CLAUDE.md simplification * fix(feature_utils): correct transform return type annotation The transform method always returns a tuple of DataFrames when return_graph=False. The second DataFrame may be empty but is never None. * docs(ai): update AI assistant documentation with Docker-first testing Emphasize containerized testing approach to avoid local environment setup issues. Updates include: - Add Docker quick start commands in README.md - Include containerized lint/typecheck commands in LINT_TYPES_CHECK.md - Clarify when direct script execution requires local environment - Add WITH_TEST=0 option for faster lint/typecheck only runs This helps AI assistants avoid common environment setup pitfalls and provides faster iteration cycles during development. * fix(ai_utils): handle empty DataFrames in infer_graph Add check for empty DataFrame before concatenation to prevent pandas errors when y is an empty DataFrame. The condition now checks both that y is not None and not empty before attempting to concatenate. This prevents runtime errors in graph inference when working with edge cases involving empty target DataFrames. * fix(feature_utils): add empty DataFrame checks in multiple functions Add defensive checks for empty DataFrames to prevent errors during feature processing: - features_without_target: Early return for empty y DataFrames - get_numeric_transformers: Check y is not empty before processing - process_dirty_dataframes: Verify y has data before encoding - FeatureMixin._featurize: Add empty check for cudf DataFrames These changes prevent AttributeError and concat errors when working with empty target DataFrames in feature engineering pipelines. * fix(umap_utils): prevent None errors with empty DataFrames Add defensive programming to handle empty DataFrames safely: - make_safe_umap_gpu_dataframes: Check for empty y before module check - _umap_fit_transform: Safe dtype logging when y is empty - _umap: Ensure y_safe is never None when passed to _infer_edges These changes prevent AttributeError when accessing properties of potentially empty DataFrames during UMAP embedding operations. * docs(changelog): update with recent commits Add entries for: - Empty DataFrame handling fixes across multiple modules - AI documentation updates for Docker-first testing Group related None/empty value handling fixes for better readability. * docs(changelg) * Add Kusto DB and kql support (#659) * Add Kusto DB and kql support * tenant id * tagging numpy temp * Plugins: Give more flexibility to the user how to init client * fix: errors in kustograph * Plugin runtime fixes and respond to comments * Plugins, open close session, kusto unwrap heuristic, * fix column types * Kusto query and query_graph doc strings * dynamic type handle * more dynamic type handling * remove pinned numpy * lint * mypy fixes * mypy fix * more mypy fixes * mypy fixes * mypy fix * update readme * Version and date to be added later --------- Co-authored-by: Alex Warren <exrhizo@gmail.com> * feat(ai): support modern sentence transformer model namespaces (#664) - Add support for organization-prefixed model names (e.g., mixedbread-ai/mxbai-embed-large-v1) - Maintain backwards compatibility with legacy model names - Preserve existing behavior for local model paths - Add comprehensive tests for all model name formats BREAKING CHANGE: None - full backwards compatibility maintained 🤖 Generated with Claude Code Co-authored-by: Claude <noreply@anthropic.com> * fix: Update PyPI publish workflow to use Trusted Publishing (#666) * fix: update PyPI publish workflow to use Trusted Publishing - Replace deprecated repository_url with repository-url - Remove password authentication for both PyPI and TestPyPI - Enable attestations for supply chain security - Use OIDC authentication via Trusted Publishing * fix: add id-token write permission for OIDC authentication * fix: disable attestations for PyPI to avoid conflict with TestPyPI attestations * fix: explicitly disable attestations for PyPI publish * docs(changelog): add PyPI Trusted Publishing workflow update * docs(changelog): add PyPI Trusted Publishing update to unreleased section * Switch Plottable to a Protocol * update changelog, comment, docs Hide the inherttited members for mixins * add overloads for transform fix lint fix lint * feat: New redteam50k dataset to be used in Microsoft Kusto (#668) * feat: New redteam50k dataset to be used in Microsoft Kusto and other demos * feat: updated the file to account for datawrangling done on the dataset in UMAP-demo. * fix: resolve mypy 1.8.0 overload errors with keyword-only arguments - Make return_graph, scaled, verbose keyword-only in transform methods - Resolves overload-overlap errors without suppressing type checking - Improves API design by making intent clearer for boolean flags - All existing call sites already use keyword arguments The overload pattern with Literal[True]/Literal[False] now works correctly because keyword-only arguments eliminate the overlap that mypy detected. Breaking change: transform() and transform_umap() now require some parameters to be keyword-only, but existing code already follows this pattern. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * fix(docs): resolve Sphinx build errors and warnings (#677) - Remove inherited members from PyTorch nn.Module in RGCN docs to avoid docstring conflicts - Fix toctree references to use correct file names without extensions - Fix docstring formatting in PlotterBase for kusto/spanner methods: - Correct indentation and spacing issues - Fix parameter type field syntax - Properly format code examples in docstrings - Replace special bullet characters with standard asterisks 🤖 Generated with [Claude Code](https://claude.ai/code) Co-authored-by: Claude <noreply@anthropic.com> * feat(docs): add notebook validation to docs build (#676) * feat(docs): add notebook validation to docs build - Add structure validation for notebooks during docs build - Validate temporal_predicates.ipynb and layout_tree.ipynb - Enable validation in CI with VALIDATE_NOTEBOOK_EXECUTION env var - Update build script to use bash for array support * build(docs): add ipykernel dependency for notebook execution - Add ipykernel==6.29.5 to docs dependencies - Register Python kernel in Dockerfile build step * fix(docs): remove temporal_predicates.ipynb from validation list - Keep only layout_tree.ipynb in validation - temporal_predicates.ipynb will be added in separate PR * feat(ci): enable notebook validation in GitHub Actions - Set VALIDATE_NOTEBOOK_EXECUTION=1 in ci.yml to enable notebook execution testing - Currently validates layout_tree.ipynb with 600s timeout - Helps ensure documentation notebooks remain executable 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * docs: update CHANGELOG for notebook validation CI 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com> * feat(docs): enable notebook validation by default with better UX (#679) * feat(docs): enable notebook validation by default - Set VALIDATE_NOTEBOOK_EXECUTION=1 as default in ci.sh - Move notebook validation to run after doc build - Add temporal_predicates.ipynb to validation list 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * fix(docs): remove temporal_predicates.ipynb from validation This notebook doesn't exist on master yet - it's only in PR #678 * fix(docs): ensure graphistry is importable during notebook validation - Fix Dockerfile to copy graphistry source before pip install -e - Add minimal CHANGELOG and README updates - Add debug check to verify graphistry import The issue was that pip install -e needs the source files present, but we were copying them after installation. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * fix(docs): add minimal test notebook for CI validation - Add test_graphistry_import.ipynb as a minimal validation notebook - Disable layout_tree.ipynb which requires networkx (not in docs deps) - Successfully validates that graphistry is importable in docs env 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * fix(docs): move test notebook to separate directory - Move test_graphistry_import.ipynb to docs/test_notebooks/ - Update Dockerfile to copy test_notebooks separately - Ensures test notebook doesn't appear in public documentation - Validation still works correctly 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com> * feat(gfql): add temporal predicates and type system - Add temporal value classes (DateTime, Date, Time) with timezone support - Extend comparison predicates (GT, LT, GE, LE, EQ, NE, Between) for temporal values - Enhance IsIn predicate to support temporal values with type validation - Add comprehensive GFQL type system with guards and coercions - Include temporal value serialization and wire format support - Add comprehensive test coverage for temporal operations Enables date/time comparisons and filtering in GFQL queries. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * fix(gfql): add TypeGuard annotations and TemporalValue to type unions - Add TypeGuard import with proper Python version handling - Add TypeGuard annotation to is_basic_scalar for type narrowing - Include TemporalValue in ComparisonInput and IsInElementInput unions - Enables proper type checking for temporal predicates 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * fix(predicates): remove type errors in is_in predicate - Remove unnecessary type: ignore comment - Add proper handling for TemporalValue in _normalize_value - Type narrowing now works correctly with TypeGuard 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * refactor(predicates): inline temporal comparison logic into operators - Remove monolithic _temporal_comparison method with operator dispatch - Add focused helper methods for series preparation and value extraction - Inline comparison logic directly in each operator (GT, LT, GE, LE, EQ, NE) - Eliminate all type: ignore and cast statements - Each operator now clearly shows both numeric and temporal comparison logic 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * docs: update CHANGELOG.md with GFQL temporal predicates feature 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * docs: list supported temporal comparison operators in CHANGELOG 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * docs: add is_in to list of temporal operators in CHANGELOG 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * fix(types): support TypeGuard across Python 3.8-3.12 - Use version check for TypeGuard import only during TYPE_CHECKING - Remove pd.Series[T] generic syntax not supported in Python 3.8 - Ensures mypy passes on all supported Python versions 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * fix(is_in): move dynamic imports to module level - Move Any import from method-level to module-level - Move DateTimeWire, DateWire, TimeWire imports to module-level - Fixes dynamic import issue flagged during code review 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * fix: merge CHANGELOG Added sections after rebase * docs: add temporal predicates documentation - Add temporal_predicates.ipynb demo notebook with TOC and examples - Add datetime_filtering.md with imports, standards, and duration notes - Update wire_protocol_examples.md with clearer structure - Documentation for GFQL temporal predicate features Co-Authored-By: Claude <noreply@anthropic.com> * docs: consolidate temporal predicates documentation - Add comprehensive datetime filtering guide - Add temporal predicates notebook with examples - Add wire protocol examples for temporal values - Update GFQL overview with temporal examples - Update predicates quick reference with temporal operators - Add temporal modules to API documentation 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * docs(gfql): update CHANGELOG with temporal predicates documentation - Add temporal value classes to changelog - Add wire protocol support mention - Add documentation references (datetime guide, wire protocol, notebook) 📤 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * fix(docs): update temporal predicates notebook with correct relative links - Add datetime_filtering and wire_protocol_examples to GFQL toctree - Use relative HTML links that work across all hosting environments - Links go up two levels from demos/gfql/ to reach gfql/ documentation 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * fix(docs): correct temporal values module path in API docs - Change from non-existent temporal_values module to ast_temporal - Fixes autodoc import error during documentation build 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * docs(gfql): improve temporal predicates documentation clarity - Streamline datetime filtering guide with concise examples - Add clear links to Python/Pandas datetime documentation - Improve notebook imports with explanatory comments - Fix broken chain.rst link to use correct path - Make documentation more actionable with direct type links 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * docs(gfql): remove recommendation language for direct facts - Remove "recommended" verbiage in favor of direct documentation - Keep examples factual without good/bad judgments - Let ordering and examples guide usage naturally 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * docs(gfql): add wire protocol dict support for ISO strings - Document that wire protocol dicts are a supported input type - Show how to use ISO strings via wire protocol dicts - Clarify that only raw strings raise ValueError 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * docs(gfql): remove good/bad language from wire protocol examples - Replace bad_filter/good_filter with factual variable names - Keep error handling examples factual without judgments - Align with direct documentation style 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * fix(docs): fix temporal_predicates notebook structure - Add missing execution_count field to all code cells - Remove invalid outputs field from markdown cell * feat(docs): add temporal_predicates notebook to validation list * fix(docs): replace Unicode characters in datetime_filtering.md for LaTeX compatibility * chore(release): update CHANGELOG for v0.38.1 * fix(imports): replace relative imports with absolute imports in GFQL modules (#681) * fix(imports): replace relative imports with absolute imports in GFQL modules - Replace all '..' relative imports with absolute 'graphistry.' imports - Fixes pip install issues caused by relative imports - Add lint check to prevent future relative imports The relative imports were breaking module resolution when installed via pip. All imports now use absolute paths from the graphistry package root. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * docs(changelog): add v0.38.3 entry for import fixes - Document fix for relative imports in GFQL modules - Note addition of lint check for relative imports 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * fix(models): add missing __init__.py files for gfql modules - Add empty __init__.py files to graphistry.models.gfql directories - Fixes ModuleNotFoundError when importing after pip install - Add docker test scripts to verify pip install works - Add CI job to test pip install across Python versions The missing __init__.py files prevented Python from recognizing the gfql directories as packages, causing import errors after pip install. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * chore: remove temporary minimal test script * refactor(ci): move pip install test to step in test-minimal-python - Remove separate test-pip-install job - Add Docker pip install test as step after typecheck - Ensures packaging is tested after lint/typecheck pass - Reduces CI complexity while maintaining test coverage 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * docs(changelog): update PR reference and details * style: fix import order to follow project conventions - Typing imports first - Then stdlib/third-party imports - One empty line - Then package global imports (from graphistry...) - Then local imports (from .) - Two empty lines before class/function definitions 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * style: alphabetize imports within sections - Sort imports alphabetically within each import section - Maintains proper import order: typing, stdlib/third-party, package, local - Improves code consistency and readability 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com> * feat(kusto): make directory for azure demos and add first notebook demonstrating graphistry and kusto * implemented changes from code review. * graphistry.client() & Iterate on kusto and spanner mixin & +types (#670) * Iterate on kusto and spanner mixin & Add more types Get pygraphistry as a singleton, remove static methods type, lint, test fixes fix lint Fix tests backwards compatibility with _config proxy and _is_authenticated * Rework kusto spanner configuration Docs and ongoing work on tests pygraphistry can return Plotter iterate towards working tests * fix tests and update the spanner notebook * ignore kusto and spanner tests in test-minimal * test skips kusto and spanner imports if not * Add set_client to pygraphistry and be backwards compatible for ArrowUploader * update docs * kusto / spanner dont close if same client * type check for PlotterBase.privacy * Add single_table option to kql * export PyGraphistryClient * More louie needed functionality * Rename PyGraphistryClient to GraphistryClient * update changelog also * Update docs and small type fix * fix (sso): fix missing sso state * Update docs and code comments * More docs improvements * Improve docs * code formatting in sphinx * wait until ArrowUploader change to set version --------- Co-authored-by: vaimdevs <vaimdevs@v-aim.com> * Improve global Arrow Cache Performance (#684) * Improve ArrowFileUploader cache behavior * Deep hash strategy * revert interfaces * Improve tests and fix cache usage * update changelog to follow on graphistry client() chng * Fix layout and add more instruction resources * Minor notebook fixes and changing directory stucture * Add minor tweaks on markdown * Fix missing } in persistent graph query * Minor fixes and save some outputs into the notebook * Fix typos and versioning * fix(kusto_graph) - add missing ) from graph_query (#691) * fix(kusto_graph) - add missing ) from graph_query * docs(kusto): add Azure Data Explorer demo to documentation TOC - Add Kusto demo notebook to plugins.connectors.rst documentation - Update CHANGELOG.md with KQL bugfix and docs entries 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Leo Meyerovich <leo@graphistry.com> Co-authored-by: Claude <noreply@anthropic.com> * kusto notebook fixes for GRAPH_NAME and register links fixes (#692) * bug: updated broken register link * bug: fixed GRAPH_NAME graph_name inconsistencies breaking notebook * Revert "bug: updated broken register link" This reverts commit ffff331. * fixed graph_name GRAPH_NAME variable, added timestamp to snapshot name, changed to default register cell * bug: updated broken register link * docs(kusto ipynb): update intro (#693) * docs(kusto ipynb): update intro * docs(changelog) * docs(changelog): fix version --------- Co-authored-by: lmeyerov <leo@graphistry.com> * chore: update copyright year from 2024 to 2025 (#694) * chore: update copyright year from 2024 to 2025 - Update copyright in docs/source/conf.py - This will regenerate all documentation with the new year 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * chore: update LICENSE.txt copyright year to 2025 - Update copyright year from 2023 to 2025 - Keep consistent with documentation copyright 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * docs: add copyright update to CHANGELOG - Document copyright year update in Dev section - No version bump needed for documentation-only change 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com> * docs(gfql): add GFQL specification documentation (#698) * docs(gfql): add specification documentation - Add complete GFQL specification documentation - language.md: Core language specification with grammar and operations - wire_protocol.md: JSON serialization format for client-server communication - cypher_mapping.md: Cypher to GFQL translation with Python and wire protocol - python_embedding.md: Python-specific implementation details - index.md: Specification overview and navigation - Update main gfql/index.rst to include Developer Resources section with spec link - Add ai_code_notes/gfql/README.md with GFQL quick reference for AI assistants This establishes the documentation foundation for GFQL specifications, supporting both human developers and AI-assisted code generation. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * docs(gfql): use headers for Core Concepts to enable TOC navigation - Convert Core Concepts from numbered list to headers (h4) - This allows each concept to appear in the table of contents - Makes it easier to navigate directly to specific concepts like Graph Model, Chains, Operations, etc. * docs(gfql): remove incomplete JSON Schema from wire protocol spec The JSON Schema was incomplete (missing FilterDict and predicate definitions) and not used by the actual implementation. Removing it to avoid confusion. * docs(gfql): remove Protocol Extensions section from wire protocol spec Remove speculative content about future extensions to keep documentation focused on current implementation. * docs(gfql): remove incorrect Error Handling section from wire protocol The documented error response format does not match the implementation. The actual implementation uses HTTP status codes for remote errors and Python exceptions for local validation, not structured JSON error objects. * docs(gfql): fix missing closing backticks in cypher_mapping.md Add missing closing triple backticks for JSON code block before Pattern Translations section to fix HTML rendering. * docs(gfql): fix code block formatting in cypher_mapping.md Ensure all JSON code blocks have proper closing backticks to prevent markdown rendering issues. * docs: Add GFQL specification documentation to changelog - Added entry for PR #698 in Dev section - Listed key documentation improvements and fixes 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com> * feat(gfql): add GFQL validation framework Add comprehensive validation framework for GFQL queries including syntax and schema validation. ## Python Code - `graphistry/compute/gfql/validate.py` - Core validation module with syntax and schema validators - `graphistry/compute/gfql/exceptions.py` - GFQLValidationError exception class - `graphistry/compute/chain_validate.py` - Chain function with validation support - `graphistry/validate/` - General validation utilities - Tests for all validation functionality ## Documentation - `docs/source/gfql/validation/` - Comprehensive validation guide - fundamentals.rst - Basic validation concepts and examples - advanced.rst - Complex query validation patterns - llm.rst - LLM integration patterns - production.rst - Production deployment patterns - API documentation for validation modules - Updated references in spec and main docs ## Notebook - `demos/gfql/gfql_validation_fundamentals.ipynb` - Interactive tutorial This provides a complete framework for validating GFQL queries at both syntax and schema levels, with helpful error messages to guide users. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * feat(gfql): complete GFQL validation framework implementation - Added structured error system with E1xx-E3xx error codes - Updated all AST classes to use new validation pattern - Fixed ASTPredicate validate() method conflicts - Added schema validation to filter_by_dict - Created pre-execution validation capability - Updated all documentation and created migration guide Co-authored-by: Claude <noreply@anthropic.com> * fix: clean up linting and type issues in GFQL validation - Remove trailing whitespace - Fix unused imports - Add newlines at end of files - Fix monkey-patch syntax for validate_schema - Format code with black Co-authored-by: Claude <noreply@anthropic.com> * fix: resolve remaining linting issues - Fix E402 by moving imports before deprecation warning in validate.py - Add missing newlines at end of test files Co-authored-by: Claude <noreply@anthropic.com> * fix: resolve mypy type checking issues in chain.py - Change _get_child_validators return type to List[ASTSerializable] - Add cast for ops list in from_json to fix variance issue Co-authored-by: Claude <noreply@anthropic.com> * fix: update test_from_json to handle new validation errors - Update test to expect GFQLValidationError instead of AssertionError - Fix test for missing 'keep' parameter (now valid with default) Co-authored-by: Claude <noreply@anthropic.com> * fix: consolidate validation notebooks and remove duplicate - Replace original gfql_validation_fundamentals.ipynb with updated content - Remove duplicate gfql_validation_fundamentals_updated.ipynb - Update notebook title to remove '(Updated)' suffix The original notebook was using the old external validation module, while the updated version demonstrates the new built-in validation system. Co-authored-by: Claude <noreply@anthropic.com> * feat: add convenience scripts for ruff and mypy - bin/ruff: wrapper that runs flake8 (PyGraphistry's linter) - bin/mypy: wrapper that runs typecheck.sh These provide familiar command names while using the project's actual linting and type checking tools. Co-authored-by: Claude <noreply@anthropic.com> * remove: delete unnecessary convenience scripts The bin/ruff and bin/mypy wrapper scripts were not needed. Use the existing bin/lint.sh and bin/typecheck.sh instead. Co-authored-by: Claude <noreply@anthropic.com> * feat: enable schema validation by default in chain() Change validate_schema from False to True by default for better UX: - Fail fast with clear error messages - Prevent wasted computation on invalid queries - Consistent with automatic syntax validation - Users can opt out with validate_schema=False if needed This provides dual-layer protection: 1. Pre-execution validation (fast, clear errors) 2. Runtime validation in filter_by_dict (safety net) Co-authored-by: Claude <noreply@anthropic.com> * docs(gfql): update all validation .rst files to use built-in validation system - Update fundamentals.rst to show automatic validation during chain construction - Update advanced.rst with complex validation patterns using collect-all mode - Update llm.rst with LLM integration patterns using structured error codes - Update production.rst with production-ready patterns and security considerations - Add deprecation notice to api/gfql/validate.rst pointing to new system - All docs now reflect validate_schema=True default and structured error codes 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * docs(gfql): fix title underline length in advanced.rst Fix RST syntax error where "Pre-execution Validation" title underline was too short. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * docs(gfql): remove Unicode emoji characters from validation .rst files Remove ✅, ❌,⚠️ , and 💡 emoji characters that cause LaTeX compilation errors in PDF documentation builds. Replace with plain text equivalents. Fixes LaTeX errors: - Unicode character ✅ (U+2705) not set up for use with LaTeX - Unicode character ❌ (U+274C) not set up for use with LaTeX - Unicode character⚠️ (U+26A0) not set up for use with LaTeX - Unicode character 💡 (U+1F4A1) not set up for use with LaTeX 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * fix(gfql): remove Unicode emoji characters from validation notebook Remove ✅, ❌, and 💡 emoji characters from gfql_validation_fundamentals.ipynb that were causing LaTeX compilation errors in PDF documentation builds. Replace with plain text equivalents to maintain readability while fixing: - Unicode character ✅ (U+2705) not set up for use with LaTeX - Unicode character ❌ (U+274C) not set up for use with LaTeX - Unicode character 💡 (U+1F4A1) not set up for use with LaTeX 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * fix(engine): use dynamic import for Plottable instanceof check to avoid Jinja dependency Update resolve_engine() to use dynamic import of graphistry.plotter.Plotter instead of direct import of graphistry.Plottable to avoid triggering Jinja dependency issues from pandas df.style getter during isinstance checks. Changes: - Try importing graphistry.plotter.Plotter first - Fallback to graphistry.Plottable import if plotter module not available - Use intermediate variable to store isinstance result for cleaner code This prevents the dynamic dependency on Jinja that was causing import failures when pandas dataframes with style getters were processed. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * refactor(gfql): use canonical graphistry.edges() and graphistry.nodes() in validation notebook Update gfql_validation_fundamentals.ipynb to use the canonical API pattern: - Remove imports of edges and nodes functions - Use graphistry.edges() instead of edges() - Use graphistry.nodes() instead of nodes() - Add comment explaining the canonical usage pattern This follows the recommended PyGraphistry API usage pattern and maintains consistency with documentation examples. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * style: revert superficial quote changes to maintain code history - Reverted single to double quote changes in ast.py and chain.py - These changes don't improve functionality and muddy git history - Kept functional changes for validation system intact * docs: remove misleading advanced validation guide - Removed advanced.rst entirely as most content was incorrect - Named operations are unrelated to validation - Temporal predicates like 'after' don't exist - Nested predicates (and_, or_) don't exist - Custom validation not supported in built-in system - No warnings, only exceptions in new validation - Error collection already covered in fundamentals - Schema evolution was just a restatement of schema validation The minimal valid content (pre-execution validation, bounded traversals) is already covered or can be added to fundamentals if needed. * docs: improve validation documentation based on feedback - Added temporal comparison examples using gt() with pd.Timestamp - Emphasized default automatic validation behavior - Made pre-execution validation clearly marked as advanced use - Marked chain_with_validation as deprecated (uses old system) - Removed references to non-existent features * cleanup: remove dead code chain_with_validation - Removed chain_validate.py and its test entirely - This used the old validation system which is deprecated - New validation is built into chain() by default * docs: remove reference to non-existent and_/or_ predicates in LLM guide * docs: fix LLM guide to use actual available error context fields - Removed references to non-existent valid_range and available_columns fields - Updated to use actual available context like column_type - Made fix suggestions work with actual error message content * docs: clarify schema validation in LLM JSON serialization example - Show both methods: separate validation and automatic via g.chain() - Make clear that validate_chain_schema needs a graph instance - Note that g.chain() executes if valid * docs: add JSON to Chain conversion examples for LLM integration - Show how to parse JSON from LLM using Chain.from_json() - Add chain_to_json() for converting to LLM examples - Include complete round-trip example with error handling - Document expected JSON format * docs: split LLM JSON example into clear subsections - JSON Format: Show expected structure - JSON Conversion: Simple conversion functions - Error Serialization: Error to dict conversion - Validation Examples: Complete workflow Each section is now focused and easier to understand * docs: reorganize LLM validation workflow into clear steps - Parse Chain from JSON: Shows parsing and error handling - Validate Chain Syntax: Syntax validation only - Validate Against Schema: Schema validation with graph - Combined Validation: Complete pipeline function Each step builds on the previous, making the flow clearer * docs(gfql): update LLM docs to use 'query objects' instead of 'Chain objects' for future compatibility * docs(gfql): consolidate redundant JSON sections in LLM guide * docs(gfql): remove redundant JSON parsing sections in LLM guide * docs(gfql): remove less useful Error Categorization section from LLM guide * docs(gfql): remove redundant sections from LLM guide - keep only essential content * docs(gfql): remove premature sections from production guide - Plottable Integration, GitHub Actions, Pre-commit Hooks, Monitoring & Logging * docs(gfql): rewrite Security Considerations to focus on GFQL's safe-by-design approach - no code execution, JSON generation with validation * docs(gfql): fix notebook links in validation docs to use .html extension * docs(gfql): remove 'Building Queries Incrementally' section from validation notebook * docs(gfql): add schema validation examples to Quick Reference section * fix: revert unnecessary syntactic changes in ast.py and chain.py - Restore original import formatting (multi-line where appropriate) - Revert quote style changes (double to single) in non-error contexts - Fix dictionary formatting (restore multi-line format) - Remove unnecessary whitespace changes This reduces diff noise and makes the PR easier to review * style: clean up spurious formatting changes in validation code - Remove extra blank lines before Edge classes - Restore trailing spaces on blank lines to match master - Fix double blank line issues - Add missing blank line after Chain class declaration These changes reduce noise in the diff and maintain consistency with the existing codebase style. * style: remove trailing commas in function signatures Remove trailing commas after Engine parameters to maintain consistency with the codebase style conventions. * fix: restore missing commas in function signatures Fixed syntax errors from overzealous comma removal: - hyper_dask.py: Added commas after engine: Engine parameters - layout_non_bulk.py: Added comma after engine: Engine parameter 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * docs: merge master CHANGELOG updates Incorporate Engine fix entry from master branch 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Thomas Cook <tcook@graphistry.com> Co-authored-by: Manfred Cheung <mj3cheun@gmail.com> Co-authored-by: Percy Camilo Triveño Aucahuasi <aucahuasi@users.noreply.github.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Tanmoy <tanmoyf2@gmail.com> Co-authored-by: Alex Warren <exrhizo@gmail.com> Co-authored-by: Sindre Breda <sindre.breda@gmail.com> Co-authored-by: Sindre Breda <sbreda@graphistry.com> Co-authored-by: vaimdevs <vaimdevs@v-aim.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
changes to address:
[BUG] spanner errors out when spanner_config does not include credentials file #643
[BUG] spanner queries give KeyError if spanner_config not set via register() #644