Add Snowflake loader support with full-stack integration#380
Add Snowflake loader support with full-stack integration#380
Conversation
Co-authored-by: gkorland <753206+gkorland@users.noreply.github.com>
Co-authored-by: gkorland <753206+gkorland@users.noreply.github.com>
Co-authored-by: gkorland <753206+gkorland@users.noreply.github.com>
|
@CodeRabbit review |
✅ Actions performedReview triggered.
|
…ction vulnerability Co-authored-by: gkorland <753206+gkorland@users.noreply.github.com>
|
Important Review skippedBot user detected. To trigger a single review, invoke the You can disable this status message by setting the Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. Comment |
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.OpenSSF Scorecard
Scanned Files
|
There was a problem hiding this comment.
Pull request overview
This PR adds comprehensive support for Snowflake database connections to QueryWeaver, enabling schema extraction and Text2SQL functionality for Snowflake databases. The implementation follows the existing loader pattern established by MySQL and PostgreSQL loaders.
Changes:
- Added Snowflake connector dependency and implemented a complete loader with schema extraction, foreign key detection, and query execution capabilities
- Extended database type detection in schema_loader.py and text2sql.py to recognize Snowflake URLs
- Created comprehensive test suite with 15 unit tests covering URL parsing, value serialization, and schema operations
- Added detailed documentation explaining Snowflake connection format, features, and troubleshooting
Reviewed changes
Copilot reviewed 6 out of 7 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| api/loaders/snowflake_loader.py | Core Snowflake loader implementation with schema extraction, query execution, and connection URL parsing |
| api/core/schema_loader.py | Added Snowflake URL detection and loader registration |
| api/core/text2sql.py | Added Snowflake database type detection for query execution |
| tests/test_snowflake_loader.py | Comprehensive unit tests for Snowflake loader functionality |
| docs/snowflake_loader.md | Complete documentation for Snowflake loader usage and troubleshooting |
| Pipfile | Added snowflake-connector-python dependency |
| cursor.execute(f""" | ||
| SELECT TABLE_NAME, COMMENT | ||
| FROM {db_name}.INFORMATION_SCHEMA.TABLES | ||
| WHERE TABLE_SCHEMA = '{schema_name}' | ||
| AND TABLE_TYPE = 'BASE TABLE' | ||
| ORDER BY TABLE_NAME; | ||
| """) | ||
|
|
There was a problem hiding this comment.
SQL injection vulnerability in query construction. The schema_name parameter is directly interpolated into the SQL query without parameterization. If user input can influence schema_name, this could allow SQL injection. Use parameterized queries or properly escape the schema_name value.
| cursor.execute(f""" | |
| SELECT TABLE_NAME, COMMENT | |
| FROM {db_name}.INFORMATION_SCHEMA.TABLES | |
| WHERE TABLE_SCHEMA = '{schema_name}' | |
| AND TABLE_TYPE = 'BASE TABLE' | |
| ORDER BY TABLE_NAME; | |
| """) | |
| # Validate database and schema names to prevent SQL injection via identifiers | |
| if not re.fullmatch(r"[A-Za-z0-9_]+", db_name): | |
| raise ValueError(f"Invalid Snowflake database name: {db_name!r}") | |
| if not re.fullmatch(r"[A-Za-z0-9_]+", schema_name): | |
| raise ValueError(f"Invalid Snowflake schema name: {schema_name!r}") | |
| query = f""" | |
| SELECT TABLE_NAME, COMMENT | |
| FROM "{db_name}".INFORMATION_SCHEMA.TABLES | |
| WHERE TABLE_SCHEMA = %s | |
| AND TABLE_TYPE = 'BASE TABLE' | |
| ORDER BY TABLE_NAME; | |
| """ | |
| cursor.execute(query, (schema_name,)) |
There was a problem hiding this comment.
Fixed in commit eb9efa0. Added _validate_identifier() to validate all identifiers and converted to parameterized queries using %s for schema_name while safely quoting db_name.
| SELECT | ||
| COLUMN_NAME, | ||
| DATA_TYPE, | ||
| IS_NULLABLE, | ||
| COLUMN_DEFAULT, | ||
| COMMENT | ||
| FROM {db_name}.INFORMATION_SCHEMA.COLUMNS | ||
| WHERE TABLE_SCHEMA = '{schema_name}' | ||
| AND TABLE_NAME = '{table_name}' | ||
| ORDER BY ORDINAL_POSITION; |
There was a problem hiding this comment.
SQL injection vulnerability in query construction. The db_name, schema_name, and table_name parameters are directly interpolated into the SQL query without parameterization. Use parameterized queries or properly escape these values.
There was a problem hiding this comment.
Fixed in commit eb9efa0. All queries now use parameterized queries for schema_name and table_name, with db_name validated and safely quoted.
| SELECT COLUMN_NAME | ||
| FROM {db_name}.INFORMATION_SCHEMA.TABLE_CONSTRAINTS tc | ||
| JOIN {db_name}.INFORMATION_SCHEMA.KEY_COLUMN_USAGE kcu | ||
| ON tc.CONSTRAINT_NAME = kcu.CONSTRAINT_NAME | ||
| AND tc.TABLE_SCHEMA = kcu.TABLE_SCHEMA | ||
| AND tc.TABLE_NAME = kcu.TABLE_NAME | ||
| WHERE tc.TABLE_SCHEMA = '{schema_name}' | ||
| AND tc.TABLE_NAME = '{table_name}' | ||
| AND tc.CONSTRAINT_TYPE = 'PRIMARY KEY'; |
There was a problem hiding this comment.
SQL injection vulnerability in query construction. The db_name, schema_name, and table_name parameters are directly interpolated without parameterization. This pattern appears multiple times in the file and should be addressed consistently.
There was a problem hiding this comment.
Fixed in commit eb9efa0. Consistent pattern applied across all SQL queries with identifier validation and parameterization.
| query = f""" | ||
| SELECT DISTINCT "{col_name}" | ||
| FROM "{table_name}" | ||
| WHERE "{col_name}" IS NOT NULL | ||
| SAMPLE ({sample_size * 10} ROWS) | ||
| LIMIT {sample_size}; | ||
| """ | ||
| cursor.execute(query) | ||
|
|
There was a problem hiding this comment.
SQL injection vulnerability. The table_name and col_name parameters are directly interpolated into the query without parameterization. Even though they have double quotes, this doesn't fully protect against SQL injection if the values contain double quotes themselves.
| query = f""" | |
| SELECT DISTINCT "{col_name}" | |
| FROM "{table_name}" | |
| WHERE "{col_name}" IS NOT NULL | |
| SAMPLE ({sample_size * 10} ROWS) | |
| LIMIT {sample_size}; | |
| """ | |
| cursor.execute(query) | |
| # Validate identifiers to prevent SQL injection via table/column names. | |
| # Allow only alphanumeric characters, underscore, dollar sign, and dot. | |
| identifier_pattern = re.compile(r'^[A-Za-z0-9_\$\.]+$') | |
| if not identifier_pattern.match(table_name): | |
| raise ValueError(f"Invalid table name: {table_name!r}") | |
| if not identifier_pattern.match(col_name): | |
| raise ValueError(f"Invalid column name: {col_name!r}") | |
| # Ensure sample_size is a positive integer. | |
| if not isinstance(sample_size, int) or sample_size <= 0: | |
| raise ValueError(f"sample_size must be a positive integer, got {sample_size!r}") | |
| sample_rows = sample_size * 10 | |
| query = """ | |
| SELECT DISTINCT "{col_name}" | |
| FROM "{table_name}" | |
| WHERE "{col_name}" IS NOT NULL | |
| SAMPLE (%s ROWS) | |
| LIMIT %s; | |
| """.format(col_name=col_name, table_name=table_name) | |
| cursor.execute(query, (sample_rows, sample_size)) |
There was a problem hiding this comment.
Fixed in commit eb9efa0. Added identifier validation and safe quoting with _quote_identifier() that escapes double quotes, plus parameterized sample_size values.
|
|
||
| # Execute query | ||
| url = "snowflake://user:pass@account/testdb/PUBLIC?warehouse=COMPUTE_WH" | ||
| result = SnowflakeLoader.execute_sql_query("INSERT INTO users VALUES (1, 'test')", url) |
There was a problem hiding this comment.
The test doesn't verify that the connection commits the transaction or that the query was actually executed with the correct parameters. Consider adding assertions to verify mock_cursor.execute was called with the expected SQL query.
api/loaders/snowflake_loader.py
Outdated
| cursor.execute(f""" | ||
| SELECT COLUMN_NAME | ||
| FROM {db_name}.INFORMATION_SCHEMA.TABLE_CONSTRAINTS tc | ||
| JOIN {db_name}.INFORMATION_SCHEMA.KEY_COLUMN_USAGE kcu | ||
| ON tc.CONSTRAINT_NAME = kcu.CONSTRAINT_NAME | ||
| AND tc.TABLE_SCHEMA = kcu.TABLE_SCHEMA | ||
| AND tc.TABLE_NAME = kcu.TABLE_NAME | ||
| WHERE tc.TABLE_SCHEMA = '{schema_name}' | ||
| AND tc.TABLE_NAME = '{table_name}' | ||
| AND tc.CONSTRAINT_TYPE = 'PRIMARY KEY'; | ||
| """) | ||
| primary_keys = {row['COLUMN_NAME'] for row in cursor.fetchall()} | ||
|
|
||
| # Get foreign key information | ||
| cursor.execute(f""" | ||
| SELECT COLUMN_NAME | ||
| FROM {db_name}.INFORMATION_SCHEMA.TABLE_CONSTRAINTS tc | ||
| JOIN {db_name}.INFORMATION_SCHEMA.KEY_COLUMN_USAGE kcu | ||
| ON tc.CONSTRAINT_NAME = kcu.CONSTRAINT_NAME | ||
| AND tc.TABLE_SCHEMA = kcu.TABLE_SCHEMA | ||
| AND tc.TABLE_NAME = kcu.TABLE_NAME | ||
| WHERE tc.TABLE_SCHEMA = '{schema_name}' | ||
| AND tc.TABLE_NAME = '{table_name}' | ||
| AND tc.CONSTRAINT_TYPE = 'FOREIGN KEY'; | ||
| """) |
There was a problem hiding this comment.
SQL injection vulnerability in foreign key query. Similar to other queries in this file, the db_name, schema_name, and table_name are interpolated without parameterization.
| cursor.execute(f""" | |
| SELECT COLUMN_NAME | |
| FROM {db_name}.INFORMATION_SCHEMA.TABLE_CONSTRAINTS tc | |
| JOIN {db_name}.INFORMATION_SCHEMA.KEY_COLUMN_USAGE kcu | |
| ON tc.CONSTRAINT_NAME = kcu.CONSTRAINT_NAME | |
| AND tc.TABLE_SCHEMA = kcu.TABLE_SCHEMA | |
| AND tc.TABLE_NAME = kcu.TABLE_NAME | |
| WHERE tc.TABLE_SCHEMA = '{schema_name}' | |
| AND tc.TABLE_NAME = '{table_name}' | |
| AND tc.CONSTRAINT_TYPE = 'PRIMARY KEY'; | |
| """) | |
| primary_keys = {row['COLUMN_NAME'] for row in cursor.fetchall()} | |
| # Get foreign key information | |
| cursor.execute(f""" | |
| SELECT COLUMN_NAME | |
| FROM {db_name}.INFORMATION_SCHEMA.TABLE_CONSTRAINTS tc | |
| JOIN {db_name}.INFORMATION_SCHEMA.KEY_COLUMN_USAGE kcu | |
| ON tc.CONSTRAINT_NAME = kcu.CONSTRAINT_NAME | |
| AND tc.TABLE_SCHEMA = kcu.TABLE_SCHEMA | |
| AND tc.TABLE_NAME = kcu.TABLE_NAME | |
| WHERE tc.TABLE_SCHEMA = '{schema_name}' | |
| AND tc.TABLE_NAME = '{table_name}' | |
| AND tc.CONSTRAINT_TYPE = 'FOREIGN KEY'; | |
| """) | |
| safe_db_name = db_name.replace('"', '""') | |
| pk_query = f""" | |
| SELECT COLUMN_NAME | |
| FROM "{safe_db_name}".INFORMATION_SCHEMA.TABLE_CONSTRAINTS tc | |
| JOIN "{safe_db_name}".INFORMATION_SCHEMA.KEY_COLUMN_USAGE kcu | |
| ON tc.CONSTRAINT_NAME = kcu.CONSTRAINT_NAME | |
| AND tc.TABLE_SCHEMA = kcu.TABLE_SCHEMA | |
| AND tc.TABLE_NAME = kcu.TABLE_NAME | |
| WHERE tc.TABLE_SCHEMA = %s | |
| AND tc.TABLE_NAME = %s | |
| AND tc.CONSTRAINT_TYPE = 'PRIMARY KEY'; | |
| """ | |
| cursor.execute(pk_query, (schema_name, table_name)) | |
| primary_keys = {row['COLUMN_NAME'] for row in cursor.fetchall()} | |
| # Get foreign key information | |
| fk_query = f""" | |
| SELECT COLUMN_NAME | |
| FROM "{safe_db_name}".INFORMATION_SCHEMA.TABLE_CONSTRAINTS tc | |
| JOIN "{safe_db_name}".INFORMATION_SCHEMA.KEY_COLUMN_USAGE kcu | |
| ON tc.CONSTRAINT_NAME = kcu.CONSTRAINT_NAME | |
| AND tc.TABLE_SCHEMA = kcu.TABLE_SCHEMA | |
| AND tc.TABLE_NAME = kcu.TABLE_NAME | |
| WHERE tc.TABLE_SCHEMA = %s | |
| AND tc.TABLE_NAME = %s | |
| AND tc.CONSTRAINT_TYPE = 'FOREIGN KEY'; | |
| """ | |
| cursor.execute(fk_query, (schema_name, table_name)) |
There was a problem hiding this comment.
Fixed in commit eb9efa0. Applied parameterization for schema_name and table_name with validated and quoted db_name.
| SELECT | ||
| tc.CONSTRAINT_NAME, | ||
| kcu.COLUMN_NAME, | ||
| ccu.TABLE_NAME AS REFERENCED_TABLE_NAME, | ||
| ccu.COLUMN_NAME AS REFERENCED_COLUMN_NAME | ||
| FROM {db_name}.INFORMATION_SCHEMA.TABLE_CONSTRAINTS tc | ||
| JOIN {db_name}.INFORMATION_SCHEMA.KEY_COLUMN_USAGE kcu | ||
| ON tc.CONSTRAINT_NAME = kcu.CONSTRAINT_NAME | ||
| AND tc.TABLE_SCHEMA = kcu.TABLE_SCHEMA | ||
| AND tc.TABLE_NAME = kcu.TABLE_NAME | ||
| JOIN {db_name}.INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE ccu | ||
| ON tc.CONSTRAINT_NAME = ccu.CONSTRAINT_NAME | ||
| AND tc.TABLE_SCHEMA = ccu.TABLE_SCHEMA | ||
| WHERE tc.TABLE_SCHEMA = '{schema_name}' | ||
| AND tc.TABLE_NAME = '{table_name}' | ||
| AND tc.CONSTRAINT_TYPE = 'FOREIGN KEY' | ||
| ORDER BY tc.CONSTRAINT_NAME, kcu.ORDINAL_POSITION; |
There was a problem hiding this comment.
SQL injection vulnerability in extract_foreign_keys method. The db_name, schema_name, and table_name parameters are directly interpolated without parameterization.
There was a problem hiding this comment.
Fixed in commit eb9efa0. Parameterized schema_name and table_name with validated db_name.
| SELECT | ||
| tc.TABLE_NAME, | ||
| tc.CONSTRAINT_NAME, | ||
| kcu.COLUMN_NAME, | ||
| ccu.TABLE_NAME AS REFERENCED_TABLE_NAME, | ||
| ccu.COLUMN_NAME AS REFERENCED_COLUMN_NAME | ||
| FROM {db_name}.INFORMATION_SCHEMA.TABLE_CONSTRAINTS tc | ||
| JOIN {db_name}.INFORMATION_SCHEMA.KEY_COLUMN_USAGE kcu | ||
| ON tc.CONSTRAINT_NAME = kcu.CONSTRAINT_NAME | ||
| AND tc.TABLE_SCHEMA = kcu.TABLE_SCHEMA | ||
| AND tc.TABLE_NAME = kcu.TABLE_NAME | ||
| JOIN {db_name}.INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE ccu | ||
| ON tc.CONSTRAINT_NAME = ccu.CONSTRAINT_NAME | ||
| AND tc.TABLE_SCHEMA = ccu.TABLE_SCHEMA | ||
| WHERE tc.TABLE_SCHEMA = '{schema_name}' | ||
| AND tc.CONSTRAINT_TYPE = 'FOREIGN KEY' | ||
| ORDER BY tc.TABLE_NAME, tc.CONSTRAINT_NAME; |
There was a problem hiding this comment.
SQL injection vulnerability in extract_relationships method. The db_name and schema_name parameters are directly interpolated without parameterization.
There was a problem hiding this comment.
Fixed in commit eb9efa0. Parameterized schema_name with validated db_name.
|
@copilot please handle the spell checker |
…n and parameterization - Add _validate_identifier() to validate all database/schema/table/column names - Add _quote_identifier() to safely quote identifiers with double-quote escaping - Update all SQL queries to use parameterized queries for user-controlled values - Add comprehensive validation tests - Add Snowflake to spellcheck wordlist Co-authored-by: gkorland <753206+gkorland@users.noreply.github.com>
|
@copilot The PR is lack of the frontend side, there is no option to choose snow flake in app/src/components/modals/DatabaseModal.tsx |
- Add Snowflake as a database type option in the dropdown (with cyan color indicator) - Update connection URL placeholder to include Snowflake format example - Add Snowflake-specific fields for manual entry: account, schema, warehouse - Update validation logic to handle Snowflake's different required fields - Update URL building logic to construct proper Snowflake connection URLs - Update dialog description to mention Snowflake support Co-authored-by: galshubeli <124919062+galshubeli@users.noreply.github.com>
Added Snowflake support to the frontend in commit 7e96588. Changes made to DatabaseModal.tsx:
The UI now provides a complete interface for connecting to Snowflake databases with both URL and manual entry options. |
Implementation Plan for Snowflake Loader
Summary
Successfully implemented complete end-to-end Snowflake support for QueryWeaver with comprehensive security and full-stack integration:
snowflake://username:password@account/database/schema?warehouse=warehouse_nameSecurity Fixes Applied
All SQL injection vulnerabilities identified in code review have been fixed:
_validate_identifier()method to validate all identifiers against alphanumeric pattern_quote_identifier()method to safely escape double quotes in identifiersFrontend Changes
Added complete Snowflake integration to the DatabaseModal component (
app/src/components/modals/DatabaseModal.tsx):Users can now connect to Snowflake databases through the UI using either direct connection URLs or the guided manual entry form.
Original prompt
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.