feat: Add QueryWeaver Python SDK for serverless Text2SQL#384
feat: Add QueryWeaver Python SDK for serverless Text2SQL#384DvirDukhan wants to merge 8 commits intostagingfrom
Conversation
|
This PR was not deployed automatically as @DvirDukhan does not have access to the Railway project. In order to get automatic PR deploys, please add @DvirDukhan to your workspace on Railway. |
Completed Working on "Code Review"✅ Workflow completed successfully. |
Dependency ReviewThe following issues were found:
License Issuespyproject.toml
OpenSSF Scorecard
Scanned Files
|
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the
📝 WalkthroughWalkthroughThis PR introduces a comprehensive Python SDK for QueryWeaver, providing synchronous and asynchronous interfaces for text-to-SQL query generation. It includes new backend modules for sync operations, a complete queryweaver_sdk package with client and model classes, Docker-based test infrastructure, and updated CI/CD configuration with uv/pipenv support. Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant QueryWeaver
participant Cache as Relevancy/Memory
participant Analyzer as Analysis Agent
participant Executor as SQL Executor
participant Healer as Healer Agent
participant Formatter as Response Formatter
participant Database
User->>QueryWeaver: query_database_sync(user_id, graph_id, chat_data)
QueryWeaver->>QueryWeaver: Validate & initialize context
QueryWeaver->>Cache: Check relevancy & find tables
Cache-->>QueryWeaver: Relevant tables identified
QueryWeaver->>Analyzer: Analyze natural language → SQL
Analyzer-->>QueryWeaver: SQL + confidence + validity
QueryWeaver->>Executor: Execute SQL
rect rgba(200, 100, 100, 0.5)
alt SQL Execution Fails
Executor-->>QueryWeaver: Error
QueryWeaver->>Healer: Heal SQL
Healer-->>QueryWeaver: Fixed SQL
QueryWeaver->>Executor: Re-execute
end
end
Executor->>Database: Run query
Database-->>Executor: Results
Executor-->>QueryWeaver: Results + execution_time
QueryWeaver->>Formatter: Format AI response
Formatter-->>QueryWeaver: Polished response
QueryWeaver-->>User: QueryResult (SQL, results, analysis, metadata)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
tests/test_sdk/test_queryweaver.py
Outdated
| assert conn_result.success | ||
|
|
||
| # First query | ||
| result1 = await qw.query( |
api/core/text2sql.py
Outdated
| find_task.cancel() | ||
| try: | ||
| await find_task | ||
| except asyncio.CancelledError: |
- Add detailed assertions for query results (customer names, counts, etc.) - Add tests for filter queries, count aggregation, and joins - Validate SQL query structure and result data - Add session-scoped event loop to fix pytest-asyncio issues - Handle async event loop cleanup errors gracefully with skip - Expand model serialization tests
Disable warnings that are intentional architectural choices: - C0415: import-outside-toplevel (lazy imports for SDK) - W0718: broad-exception-caught (error handling) - R0902: too-many-instance-attributes (dataclasses) - R0903: too-few-public-methods - R0911: too-many-return-statements - R0913/R0917: too-many-arguments (SDK API design) - C0302: too-many-lines
- Extract SDK sync functions to new api/core/text2sql_sync.py module - Split QueryResult into composition: QueryResult + QueryMetadata + QueryAnalysis - Reduce local variables in query_database_sync with helper functions - Fix broad exception handling - use specific Redis/Connection/OS errors - Refactor query method to accept Union[str, QueryRequest] - Add compatibility properties to QueryResult for backwards compatibility - Document lazy imports in client.py module docstring Pylint score improved from 9.81/10 to 9.91/10 Remaining E0401 errors are missing dependencies in venv, not code issues
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 18
🤖 Fix all issues with AI agents
In @.github/workflows/tests.yml:
- Around line 130-134: The CI .env creation step ("Create test environment
file") currently only writes FASTAPI_SECRET_KEY; update that step to also append
FALKORDB_URL with the proper test URL (use the same FALKORDB_URL value for the
SDK job and ensure the unit-tests step that generates the CI .env uses the
identical value) so both workflow steps produce a .env containing
FASTAPI_SECRET_KEY and FALKORDB_URL; keep the rest of the workflow (Python 3.12,
pipenv sync --dev, starting FalkorDB for tests, Playwright browser install)
unchanged.
In `@api/core/schema_loader.py`:
- Around line 230-237: The except block currently embeds raw exception text
(str(e)) into the DatabaseConnection.message which may leak credentials;
instead, update the except handler that catches (RedisError, ConnectionError,
OSError) to log the full exception (logging.exception already does this) but
return a generic error message in the DatabaseConnection (e.g., "Error
connecting to database") and preserve success=False, tables_loaded=0; reference
the DatabaseConnection construction in this except block and the exception
variable e when making the change.
- Around line 212-221: The returned DatabaseConnection currently sets
database_id to just the extracted db_name (from url.split(...)) which omits the
required user namespace; in the load_database_sync function build the namespaced
graph_id by prepending the available user_id (e.g., f"{user_id}_{db_name}") and
return that as database_id in the DatabaseConnection so callers receive the same
namespaced graph_id produced by the loaders' refresh_graph_schema methods;
update the DatabaseConnection(...) call to use the constructed namespaced id
instead of raw db_name, keeping tables_loaded, success and message the same.
In `@api/core/text2sql_sync.py`:
- Around line 199-202: The current error handling in the block checking
healing_result uses "raise exec_error", which loses the original traceback;
update the handler in the function where healing_result and exec_error are
defined (the block that currently reads if not healing_result.get("success"):
raise exec_error) to use a bare "raise" so the original exception traceback is
preserved when re-raising the caught error.
- Around line 338-345: The fire-and-forget asyncio.create_task call that invokes
ctx.memory_tool.save_query_memory (using ctx.chat.queries_history[-1] and
final_sql) can drop exceptions; update this to capture the created Task (e.g.,
task = asyncio.create_task(...)) and either await it at an appropriate point or
wrap the coroutine body in a try/except that logs exceptions via the existing
logger (or attach a done callback that logs task.exception()). Ensure you
reference the asyncio.create_task invocation and the
ctx.memory_tool.save_query_memory call when implementing the change so failures
are surfaced instead of being silently lost.
- Around line 58-60: The _graph_name function currently always prefixes graph_id
with user_id; change it to match the original behavior by returning graph_id
unchanged when it already starts with GENERAL_PREFIX, otherwise return the
namespaced f"{user_id}_{graph_id}". Update the logic in function _graph_name to
check for GENERAL_PREFIX (the same constant used in api/core/text2sql.py) and
apply the prefixing only when the graph_id does not start with that prefix.
In `@Makefile`:
- Line 71: The Makefile currently silences pylint failures with "|| true"
causing lint errors to be ignored; update the rule that runs "$(RUN_CMD) pylint
$(shell git ls-files '*.py') || true" to remove the "|| true" so pylint's
non-zero exit status fails the make target (or alternatively capture and re-exit
with the pylint status), ensuring the lint step using RUN_CMD and pylint
actually propagates failures in CI.
- Line 55: The test-unit Makefile target currently runs pytest with -k "not e2e"
but still includes SDK integration tests; update the pytest invocation in the
rule that uses $(RUN_CMD) python -m pytest tests/ to also exclude the SDK
integration directory (e.g., add an additional -k filter like 'and not test_sdk'
or use -k "not e2e and not test_sdk") so tests/test_sdk are skipped when running
the unit target; ensure you update the same command using the $(RUN_CMD)
invocation so test-unit no longer runs SDK integration tests.
In `@pyproject.toml`:
- Line 47: Replace the moving-target git ref for the graphiti-core dependency
with a specific commit SHA to ensure reproducible installs: locate the
dependency line that currently reads "graphiti-core @
git+https://github.com/FalkorDB/graphiti.git@staging" in pyproject.toml and
change the branch ref to a full commit hash (format:
git+https://...@<commit-sha>), committing the updated pyproject.toml so future
installs use that exact commit; update the SHA deliberately when you want to
pull upstream changes.
In `@queryweaver_sdk/client.py`:
- Around line 76-83: _setup_connection currently writes the FalkorDB connection
into module-global api.extensions.db which will be overwritten when multiple
QueryWeaver instances exist; change this by removing the direct assignment and
instead either (a) add an instance-level accessor on QueryWeaver that other
components call to get the connection, (b) add a registration API on
api.extensions (e.g. api.extensions.register_db(instance_id, db) and use a
per-instance key) or (c) use a context/local registry to hold the connection for
the current instance, and update callers to obtain the connection via that new
accessor/registry; if you opt not to change behavior, add documentation to the
QueryWeaver class and _setup_connection noting that only a single SDK instance
is supported and that it mutates api.extensions.db.
- Around line 99-101: The current code truncates graph_id via
graph_id.strip()[:200] then only checks for emptiness but errors claim "must be
non-empty and less than 200 characters"; fix by validating length before
truncation and return a consistent error: compute clean = graph_id.strip(), if
not clean or len(clean) > 200 then raise the appropriate error (either raise
ValueError with message "Invalid graph_id, must be non-empty and less than 200
characters." or, to match api/core/text2sql.py:_graph_name, raise
GraphNotFoundError with "Invalid graph_id, must be less than 200 characters.");
only truncate after passing validation if truncation is needed, and update the
raised exception type/message to match the chosen behavior.
In `@queryweaver_sdk/connection.py`:
- Around line 111-116: The close() method must also close internal non-pooled
FalkorDB Redis connections: before setting self._db = None, detect when
self._pool is None and self._db exists, access the internal connection via
self._db.connection and await its aclose() to release the underlying
redis.asyncio.Redis client; keep the existing pool disconnect logic for when
self._pool is not None and ensure both branches null out self._db afterwards.
In `@tests/test_sdk/conftest.py`:
- Around line 109-144: The fixture currently reads TEST_MYSQL_URL into the
variable url but then ignores it and hardcodes credentials when creating the
pymysql connection; update the fixture to parse TEST_MYSQL_URL (falling back to
the existing default) and extract host, port, user, password, and database, then
pass those parsed values into pymysql.connect instead of the hardcoded
localhost/root/root/testdb; look for the variable url and the
pymysql.connect(...) call in conftest.py and replace the hardcoded args with the
parsed components (use urllib.parse.urlparse or similar) so the fixture respects
the env var and avoids hardcoded credentials.
- Around line 147-153: The queryweaver fixture yields a QueryWeaver instance but
never closes it, leaking connections; wrap the yield in a try/finally and call
the instance cleanup method (e.g., qw.close() or await qw.aclose() if async) in
the finally block so the QueryWeaver created in the fixture (symbol:
QueryWeaver, variable: qw, fixture name: queryweaver) is properly closed after
tests complete.
- Around line 21-26: Remove the custom session-scoped event_loop fixture (the
function named event_loop) from conftest.py; this redefinition is
deprecated/removed in pytest-asyncio. Delete the event_loop fixture and instead
mark async tests with pytest.mark.asyncio(scope="session") (or
loop_scope="session" for 0.24+) or set asyncio_default_fixture_loop_scope =
"session" in pytest configuration so tests get a session-scoped loop without
redefining event_loop.
In `@tests/test_sdk/test_queryweaver.py`:
- Line 224: The assert in the test (the assertion comparing "Bob Jones" against
customer_names) uses an unnecessary f-string; change the assertion message in
the line containing assert "Bob Jones" not in customer_names to use a plain
string (remove the leading f from the message) so it reads: assert "Bob Jones"
not in customer_names, "'Bob Jones' should not be in NYC results".
- Around line 391-412: The test is calling QueryResult with flattened fields
that don't exist; instead build the nested QueryMetadata and QueryAnalysis
objects and pass them into QueryResult (e.g., create a QueryMetadata instance
for sql_query and results, and a QueryAnalysis instance for ai_response,
confidence, is_destructive, requires_confirmation, execution_time), then call
QueryResult(..., metadata=that_metadata, analysis=that_analysis) and update
assertions to read from d["metadata"] and d["analysis"] (or the dict keys
produced by QueryResult.to_dict()) to match the model's structure.
- Around line 447-465: The test calls QueryResult with a non-existent confidence
parameter; instead instantiate a QueryMetadata with the confidence value and
pass it via the QueryResult.metadata field. Update the
test_query_result_default_values to import QueryMetadata (from
queryweaver_sdk.models) and create metadata = QueryMetadata(confidence=0.8) then
construct QueryResult(sql_query="SELECT 1", results=[], ai_response="Test",
metadata=metadata) and keep the same assertions for default optional fields on
the QueryResult instance.
🧹 Nitpick comments (7)
tests/test_sdk/test_queryweaver.py (3)
85-86: Use a more specific exception type instead of bareException.Catching
Exceptionis too broad and may mask unrelated failures. Based on theInvalidArgumentErrorraised by the SDK for invalid URLs (perapi/core/schema_loader.py), use that specific exception.♻️ Proposed fix
`@pytest.mark.asyncio` async def test_connect_invalid_url(self, queryweaver): """Test connecting with invalid URL format.""" - with pytest.raises(Exception): # Should raise InvalidArgumentError + from api.core.errors import InvalidArgumentError + with pytest.raises(InvalidArgumentError): await queryweaver.connect_database("invalid://url")
266-269: Rename unused loop variablekeyto_key.The loop variable is not used within the loop body.
♻️ Proposed fix
- for key, val in first_result.items(): + for _key, val in first_result.items(): if isinstance(val, int): count_value = val break
51-51: Unusedhas_llm_keyfixture parameter.The
has_llm_keyfixture is injected but never used in these test methods. If this is intentional (to ensure LLM key presence before running), consider adding a brief comment or usingpytest.mark.usefixtures("has_llm_key")as a class decorator instead.Also applies to: 68-68, 94-94, 143-143, 185-185, 234-234, 285-285, 336-336, 369-369
queryweaver_sdk/__init__.py (1)
40-51: Consider sorting__all__alphabetically for consistency.Static analysis suggests sorting the exports. This is optional but improves maintainability.
♻️ Proposed fix
__all__ = [ + "ChatMessage", + "DatabaseConnection", + "FalkorDBConnection", + "QueryAnalysis", + "QueryMetadata", + "QueryRequest", + "QueryResult", "QueryWeaver", - "QueryResult", - "QueryMetadata", - "QueryAnalysis", - "SchemaResult", - "DatabaseConnection", "RefreshResult", - "QueryRequest", - "ChatMessage", - "FalkorDBConnection", + "SchemaResult", ]queryweaver_sdk/models.py (1)
1-209: Consider adding a factory method for backward-compatible construction.The pipeline failure shows tests using
QueryResult(confidence=0.95, ...)which doesn't work with the current signature. While fixing the tests is the right approach, you could also add a@classmethodfactory for convenience if flat-kwarg construction is a common pattern.♻️ Optional factory method
`@classmethod` def from_flat( cls, sql_query: str, results: list[dict[str, Any]], ai_response: str, confidence: float = 0.0, execution_time: float = 0.0, is_valid: bool = True, is_destructive: bool = False, requires_confirmation: bool = False, missing_information: str = "", ambiguities: str = "", explanation: str = "", ) -> "QueryResult": """Create QueryResult from flat keyword arguments.""" return cls( sql_query=sql_query, results=results, ai_response=ai_response, metadata=QueryMetadata( confidence=confidence, execution_time=execution_time, is_valid=is_valid, is_destructive=is_destructive, requires_confirmation=requires_confirmation, ), analysis=QueryAnalysis( missing_information=missing_information, ambiguities=ambiguities, explanation=explanation, ), )api/core/text2sql_sync.py (2)
472-477: Uselogging.exceptionfor automatic traceback inclusion.
logging.exceptionautomatically includes the traceback when called from an exception handler.♻️ Proposed fix
except (RedisError, ConnectionError, OSError) as e: - logging.error("Error executing SQL query: %s", str(e)) + logging.exception("Error executing SQL query") return _build_query_result( sql_query=analysis.sql_query, results=[], - ai_response=f"Error executing SQL query: {str(e)}", + ai_response=f"Error executing SQL query: {e!s}",Apply the same pattern at lines 576 and 631.
181-183: Move success return outside the try block.Per Ruff TRY300, returning inside
trycan mask exceptions raised during the return statement itself.♻️ Proposed fix
try: query_results = context.loader_class.execute_sql_query(sql_query, context.db_url) - return sql_query, query_results except (RedisError, ConnectionError, OSError) as exec_error: # healing logic... + else: + return sql_query, query_results
| - name: Create test environment file | ||
| run: | | ||
| cp .env.example .env | ||
| echo "FASTAPI_SECRET_KEY=test-secret-key" >> .env | ||
|
|
There was a problem hiding this comment.
Add FALKORDB_URL to the CI .env.
Please include the FalkorDB URL in the generated .env for the SDK job (and keep the unit-tests step in sync) so the CI environment file matches the workflow guidelines.
🔧 Proposed fix
- name: Create test environment file
run: |
cp .env.example .env
echo "FASTAPI_SECRET_KEY=test-secret-key" >> .env
+ echo "FALKORDB_URL=redis://localhost:6379" >> .envAs per coding guidelines: GitHub Actions workflows should set up Python 3.12, use pipenv (sync --dev), create a CI .env using FALKORDB_URL, start FalkorDB for tests, and install Playwright browsers for E2E.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - name: Create test environment file | |
| run: | | |
| cp .env.example .env | |
| echo "FASTAPI_SECRET_KEY=test-secret-key" >> .env | |
| - name: Create test environment file | |
| run: | | |
| cp .env.example .env | |
| echo "FASTAPI_SECRET_KEY=test-secret-key" >> .env | |
| echo "FALKORDB_URL=redis://localhost:6379" >> .env |
🤖 Prompt for AI Agents
In @.github/workflows/tests.yml around lines 130 - 134, The CI .env creation
step ("Create test environment file") currently only writes FASTAPI_SECRET_KEY;
update that step to also append FALKORDB_URL with the proper test URL (use the
same FALKORDB_URL value for the SDK job and ensure the unit-tests step that
generates the CI .env uses the identical value) so both workflow steps produce a
.env containing FASTAPI_SECRET_KEY and FALKORDB_URL; keep the rest of the
workflow (Python 3.12, pipenv sync --dev, starting FalkorDB for tests,
Playwright browser install) unchanged.
| # Extract database name from the message or URL | ||
| # The loader typically returns the graph_id in the final message | ||
| db_name = url.split("/")[-1].split("?")[0] # Extract DB name from URL | ||
|
|
||
| return DatabaseConnection( | ||
| database_id=db_name, | ||
| success=True, | ||
| tables_loaded=tables_loaded, | ||
| message="Database connected and schema loaded successfully", | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Inspect loader.load implementations for graph_id/database_id derivation
rg -n -C3 "async def load\(" api/loaders -g '*.py'Repository: FalkorDB/QueryWeaver
Length of output: 1426
🏁 Script executed:
#!/bin/bash
# Search for graph_id and database_id usage patterns
rg -n -C3 "graph_id|database_id" api/loaders api/core queryweaver_sdk -g '*.py'Repository: FalkorDB/QueryWeaver
Length of output: 28477
🏁 Script executed:
#!/bin/bash
# Check DatabaseConnection usage in SDK
rg -n -C3 "DatabaseConnection" queryweaver_sdk -g '*.py'Repository: FalkorDB/QueryWeaver
Length of output: 2361
🏁 Script executed:
#!/bin/bash
# Also check the schema_loader.py file to understand context
cat -n api/core/schema_loader.py | head -230 | tail -50Repository: FalkorDB/QueryWeaver
Length of output: 2160
Return the actual namespaced graph_id instead of extracting database name from URL.
The loader creates a namespaced graph_id in the format {user_id}_{database_name} (as shown in the loaders' refresh_graph_schema methods), but load_database_sync returns only the database name extracted from the URL. This mismatch causes SDK calls using the returned database_id to fail. Since user_id is already available in the function, construct and return the namespaced graph_id: database_id=f"{user_id}_{db_name}".
🤖 Prompt for AI Agents
In `@api/core/schema_loader.py` around lines 212 - 221, The returned
DatabaseConnection currently sets database_id to just the extracted db_name
(from url.split(...)) which omits the required user namespace; in the
load_database_sync function build the namespaced graph_id by prepending the
available user_id (e.g., f"{user_id}_{db_name}") and return that as database_id
in the DatabaseConnection so callers receive the same namespaced graph_id
produced by the loaders' refresh_graph_schema methods; update the
DatabaseConnection(...) call to use the constructed namespaced id instead of raw
db_name, keeping tables_loaded, success and message the same.
| except (RedisError, ConnectionError, OSError) as e: | ||
| logging.exception("Error loading database: %s", str(e)) | ||
| return DatabaseConnection( | ||
| database_id="", | ||
| success=False, | ||
| tables_loaded=0, | ||
| message=f"Error connecting to database: {str(e)}", | ||
| ) |
There was a problem hiding this comment.
Avoid returning raw exception text (may include credentials).
str(e) can include connection strings or user info. Prefer a generic message and keep the detailed error only in logs.
🔧 Proposed fix
return DatabaseConnection(
database_id="",
success=False,
tables_loaded=0,
- message=f"Error connecting to database: {str(e)}",
+ message="Error connecting to database",
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| except (RedisError, ConnectionError, OSError) as e: | |
| logging.exception("Error loading database: %s", str(e)) | |
| return DatabaseConnection( | |
| database_id="", | |
| success=False, | |
| tables_loaded=0, | |
| message=f"Error connecting to database: {str(e)}", | |
| ) | |
| except (RedisError, ConnectionError, OSError) as e: | |
| logging.exception("Error loading database: %s", str(e)) | |
| return DatabaseConnection( | |
| database_id="", | |
| success=False, | |
| tables_loaded=0, | |
| message="Error connecting to database", | |
| ) |
🧰 Tools
🪛 Ruff (0.14.14)
[warning] 231-231: Redundant exception object included in logging.exception call
(TRY401)
[warning] 236-236: Use explicit conversion flag
Replace with conversion flag
(RUF010)
🤖 Prompt for AI Agents
In `@api/core/schema_loader.py` around lines 230 - 237, The except block currently
embeds raw exception text (str(e)) into the DatabaseConnection.message which may
leak credentials; instead, update the except handler that catches (RedisError,
ConnectionError, OSError) to log the full exception (logging.exception already
does this) but return a generic error message in the DatabaseConnection (e.g.,
"Error connecting to database") and preserve success=False, tables_loaded=0;
reference the DatabaseConnection construction in this except block and the
exception variable e when making the change.
| def _graph_name(user_id: str, graph_id: str) -> str: | ||
| """Generate namespaced graph name.""" | ||
| return f"{user_id}_{graph_id}" |
There was a problem hiding this comment.
_graph_name doesn't handle GENERAL_PREFIX like the original in api/core/text2sql.py.
The original _graph_name in api/core/text2sql.py (lines 99-108) checks for GENERAL_PREFIX and returns the graph_id unchanged if it starts with that prefix. This implementation always prefixes with user_id_.
🔧 Proposed fix to align with original behavior
def _graph_name(user_id: str, graph_id: str) -> str:
"""Generate namespaced graph name."""
+ graph_id = graph_id.strip()[:200]
+ if not graph_id:
+ raise InvalidArgumentError("Invalid graph_id, must be non-empty")
+
+ if GENERAL_PREFIX and graph_id.startswith(GENERAL_PREFIX):
+ return graph_id
+
return f"{user_id}_{graph_id}"🤖 Prompt for AI Agents
In `@api/core/text2sql_sync.py` around lines 58 - 60, The _graph_name function
currently always prefixes graph_id with user_id; change it to match the original
behavior by returning graph_id unchanged when it already starts with
GENERAL_PREFIX, otherwise return the namespaced f"{user_id}_{graph_id}". Update
the logic in function _graph_name to check for GENERAL_PREFIX (the same constant
used in api/core/text2sql.py) and apply the prefixing only when the graph_id
does not start with that prefix.
| if not healing_result.get("success"): | ||
| raise exec_error | ||
|
|
||
| return healing_result["sql_query"], healing_result["query_results"] |
There was a problem hiding this comment.
Use bare raise to preserve the original traceback.
Using raise exec_error loses the traceback from the healing attempt. Use bare raise instead.
🔧 Proposed fix
if not healing_result.get("success"):
- raise exec_error
+ raise🧰 Tools
🪛 Ruff (0.14.14)
[warning] 200-200: Use raise without specifying exception name
Remove exception name
(TRY201)
🤖 Prompt for AI Agents
In `@api/core/text2sql_sync.py` around lines 199 - 202, The current error handling
in the block checking healing_result uses "raise exec_error", which loses the
original traceback; update the handler in the function where healing_result and
exec_error are defined (the block that currently reads if not
healing_result.get("success"): raise exec_error) to use a bare "raise" so the
original exception traceback is preserved when re-raising the caught error.
| url = os.getenv("TEST_MYSQL_URL", "mysql://root:root@localhost:3306/testdb") | ||
|
|
||
| # Verify connection and create test schema | ||
| try: | ||
| import pymysql | ||
| conn = pymysql.connect( | ||
| host='localhost', | ||
| user='root', | ||
| password='root', | ||
| database='testdb' | ||
| ) | ||
| cursor = conn.cursor() | ||
|
|
||
| # Create test tables | ||
| cursor.execute("DROP TABLE IF EXISTS products") | ||
| cursor.execute(""" | ||
| CREATE TABLE IF NOT EXISTS products ( | ||
| id INT AUTO_INCREMENT PRIMARY KEY, | ||
| name VARCHAR(100) NOT NULL, | ||
| category VARCHAR(50), | ||
| price DECIMAL(10,2) | ||
| ) | ||
| """) | ||
|
|
||
| cursor.execute(""" | ||
| INSERT INTO products (name, category, price) VALUES | ||
| ('Laptop', 'Electronics', 999.99), | ||
| ('Mouse', 'Electronics', 29.99), | ||
| ('Desk', 'Furniture', 199.99) | ||
| """) | ||
| conn.commit() | ||
| conn.close() | ||
| except Exception as e: | ||
| pytest.skip(f"MySQL not available: {e}") | ||
|
|
||
| return url |
There was a problem hiding this comment.
MySQL fixture ignores url and hardcodes credentials.
The fixture reads TEST_MYSQL_URL into url but then ignores it entirely, hardcoding localhost, root, root, testdb. This creates inconsistency and a potential security concern with hardcoded credentials.
🔧 Proposed fix - parse URL or use env vars consistently
`@pytest.fixture`(scope="session")
def mysql_url():
"""Provide MySQL connection URL with test database."""
url = os.getenv("TEST_MYSQL_URL", "mysql://root:root@localhost:3306/testdb")
# Verify connection and create test schema
try:
import pymysql
+ from urllib.parse import urlparse
+ parsed = urlparse(url.replace("mysql://", "mysql+pymysql://"))
+
conn = pymysql.connect(
- host='localhost',
- user='root',
- password='root',
- database='testdb'
+ host=parsed.hostname or 'localhost',
+ port=parsed.port or 3306,
+ user=parsed.username or 'root',
+ password=parsed.password or 'root',
+ database=parsed.path.lstrip('/') or 'testdb'
)🧰 Tools
🪛 ast-grep (0.40.5)
[warning] 113-118: A secret is hard-coded in the application. Secrets stored in source code, such as credentials, identifiers, and other types of sensitive data, can be leaked and used by internal or external malicious actors. Use environment variables to securely provide credentials and other secrets or retrieve them from a secure vault or Hardware Security Module (HSM).
Context: pymysql.connect(
host='localhost',
user='root',
password='root',
database='testdb'
)
Note: [CWE-798]: Use of Hard-coded Credentials [OWASP A07:2021]: Identification and Authentication Failures [REFERENCES]
https://cheatsheetseries.owasp.org/cheatsheets/Secrets_Management_Cheat_Sheet.html
(python-pymysql-hardcoded-secret-python)
🪛 Ruff (0.14.14)
[error] 117-117: Possible hardcoded password assigned to argument: "password"
(S106)
[warning] 141-141: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
In `@tests/test_sdk/conftest.py` around lines 109 - 144, The fixture currently
reads TEST_MYSQL_URL into the variable url but then ignores it and hardcodes
credentials when creating the pymysql connection; update the fixture to parse
TEST_MYSQL_URL (falling back to the existing default) and extract host, port,
user, password, and database, then pass those parsed values into pymysql.connect
instead of the hardcoded localhost/root/root/testdb; look for the variable url
and the pymysql.connect(...) call in conftest.py and replace the hardcoded args
with the parsed components (use urllib.parse.urlparse or similar) so the fixture
respects the env var and avoids hardcoded credentials.
| @pytest.fixture | ||
| def queryweaver(falkordb_url): | ||
| """Provide initialized QueryWeaver instance.""" | ||
| from queryweaver_sdk import QueryWeaver | ||
|
|
||
| qw = QueryWeaver(falkordb_url=falkordb_url, user_id="test_user") | ||
| yield qw |
There was a problem hiding this comment.
QueryWeaver instance is not closed after use.
The queryweaver fixture yields the instance but doesn't close it, potentially leaking the FalkorDB connection.
🔧 Proposed fix
`@pytest.fixture`
-def queryweaver(falkordb_url):
+async def queryweaver(falkordb_url):
"""Provide initialized QueryWeaver instance."""
from queryweaver_sdk import QueryWeaver
qw = QueryWeaver(falkordb_url=falkordb_url, user_id="test_user")
yield qw
+ await qw.close()🤖 Prompt for AI Agents
In `@tests/test_sdk/conftest.py` around lines 147 - 153, The queryweaver fixture
yields a QueryWeaver instance but never closes it, leaking connections; wrap the
yield in a try/finally and call the instance cleanup method (e.g., qw.close() or
await qw.aclose() if async) in the finally block so the QueryWeaver created in
the fixture (symbol: QueryWeaver, variable: qw, fixture name: queryweaver) is
properly closed after tests complete.
| assert "Alice Smith" in customer_names, f"Expected 'Alice Smith' in results, got {customer_names}" | ||
| assert "Carol White" in customer_names, f"Expected 'Carol White' in results, got {customer_names}" | ||
| # Bob Jones should NOT be in results (he's from Los Angeles) | ||
| assert "Bob Jones" not in customer_names, f"'Bob Jones' should not be in NYC results" |
There was a problem hiding this comment.
Remove extraneous f-prefix from string literal.
This f-string has no placeholders and should be a regular string.
🔧 Proposed fix
- assert "Bob Jones" not in customer_names, f"'Bob Jones' should not be in NYC results"
+ assert "Bob Jones" not in customer_names, "'Bob Jones' should not be in NYC results"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| assert "Bob Jones" not in customer_names, f"'Bob Jones' should not be in NYC results" | |
| assert "Bob Jones" not in customer_names, "'Bob Jones' should not be in NYC results" |
🧰 Tools
🪛 Ruff (0.14.14)
[error] 224-224: f-string without any placeholders
Remove extraneous f prefix
(F541)
🤖 Prompt for AI Agents
In `@tests/test_sdk/test_queryweaver.py` at line 224, The assert in the test (the
assertion comparing "Bob Jones" against customer_names) uses an unnecessary
f-string; change the assertion message in the line containing assert "Bob Jones"
not in customer_names to use a plain string (remove the leading f from the
message) so it reads: assert "Bob Jones" not in customer_names, "'Bob Jones'
should not be in NYC results".
| def test_query_result_to_dict(self): | ||
| """Test QueryResult serialization.""" | ||
| from queryweaver_sdk.models import QueryResult | ||
|
|
||
| result = QueryResult( | ||
| sql_query="SELECT * FROM customers", | ||
| results=[{"id": 1, "name": "Alice"}], | ||
| ai_response="Found 1 customer", | ||
| confidence=0.95, | ||
| is_destructive=False, | ||
| requires_confirmation=False, | ||
| execution_time=0.5, | ||
| ) | ||
|
|
||
| d = result.to_dict() | ||
| assert d["sql_query"] == "SELECT * FROM customers" | ||
| assert d["confidence"] == 0.95 | ||
| assert d["results"] == [{"id": 1, "name": "Alice"}] | ||
| assert d["ai_response"] == "Found 1 customer" | ||
| assert d["is_destructive"] is False | ||
| assert d["requires_confirmation"] is False | ||
| assert d["execution_time"] == 0.5 |
There was a problem hiding this comment.
Test uses incorrect constructor signature for QueryResult.
The pipeline failure indicates QueryResult.__init__() got an unexpected keyword argument 'confidence'. Per queryweaver_sdk/models.py, QueryResult accepts metadata: QueryMetadata and analysis: QueryAnalysis as nested objects, not top-level confidence, is_destructive, etc.
🐛 Proposed fix
+ from queryweaver_sdk.models import QueryMetadata
+
result = QueryResult(
sql_query="SELECT * FROM customers",
results=[{"id": 1, "name": "Alice"}],
ai_response="Found 1 customer",
- confidence=0.95,
- is_destructive=False,
- requires_confirmation=False,
- execution_time=0.5,
+ metadata=QueryMetadata(
+ confidence=0.95,
+ is_destructive=False,
+ requires_confirmation=False,
+ execution_time=0.5,
+ ),
)🤖 Prompt for AI Agents
In `@tests/test_sdk/test_queryweaver.py` around lines 391 - 412, The test is
calling QueryResult with flattened fields that don't exist; instead build the
nested QueryMetadata and QueryAnalysis objects and pass them into QueryResult
(e.g., create a QueryMetadata instance for sql_query and results, and a
QueryAnalysis instance for ai_response, confidence, is_destructive,
requires_confirmation, execution_time), then call QueryResult(...,
metadata=that_metadata, analysis=that_analysis) and update assertions to read
from d["metadata"] and d["analysis"] (or the dict keys produced by
QueryResult.to_dict()) to match the model's structure.
| def test_query_result_default_values(self): | ||
| """Test QueryResult with minimal required values.""" | ||
| from queryweaver_sdk.models import QueryResult | ||
|
|
||
| result = QueryResult( | ||
| sql_query="SELECT 1", | ||
| results=[], | ||
| ai_response="Test", | ||
| confidence=0.8, | ||
| ) | ||
|
|
||
| # Check defaults for optional fields | ||
| assert result.is_destructive is False | ||
| assert result.requires_confirmation is False | ||
| assert result.execution_time == 0.0 | ||
| assert result.is_valid is True | ||
| assert result.missing_information == "" | ||
| assert result.ambiguities == "" | ||
| assert result.explanation == "" |
There was a problem hiding this comment.
Test uses incorrect constructor signature for QueryResult.
Same issue as above - confidence should be passed via QueryMetadata.
🐛 Proposed fix
+ from queryweaver_sdk.models import QueryMetadata
+
result = QueryResult(
sql_query="SELECT 1",
results=[],
ai_response="Test",
- confidence=0.8,
+ metadata=QueryMetadata(confidence=0.8),
)🤖 Prompt for AI Agents
In `@tests/test_sdk/test_queryweaver.py` around lines 447 - 465, The test calls
QueryResult with a non-existent confidence parameter; instead instantiate a
QueryMetadata with the confidence value and pass it via the QueryResult.metadata
field. Update the test_query_result_default_values to import QueryMetadata (from
queryweaver_sdk.models) and create metadata = QueryMetadata(confidence=0.8) then
construct QueryResult(sql_query="SELECT 1", results=[], ai_response="Test",
metadata=metadata) and keep the same assertions for default optional fields on
the QueryResult instance.
Summary
This PR introduces a standalone Python SDK (
queryweaver_sdk) that exposes QueryWeaver's Text2SQL functionality as an embeddable library. Users can now convert natural language to SQL directly in their Python applications without running a web server.Features
New SDK Package (
queryweaver_sdk/)QueryWeaverclass - Main entry point with async methods:connect_database(db_url)- Connect PostgreSQL/MySQL databasesquery(database, question)- Convert natural language to SQL and executeget_schema(database)- Retrieve database schemalist_databases()- List connected databasesdelete_database(database)- Remove database from FalkorDBrefresh_schema(database)- Re-sync schema after changesexecute_confirmed(database, sql)- Execute confirmed destructive operationsResult models (
models.py):QueryResult- SQL query, results, AI response, confirmation flagsSchemaResult- Tables (nodes) and relationships (links)DatabaseConnection- Connection status and metadataRefreshResult- Schema refresh statusConnection management (
connection.py):FalkorDBConnection- Explicit FalkorDB connection handlingUsage Example
Modern Python Packaging
pyproject.tomlwith PEP 517/518 compliance (hatchling backend)pip install queryweaver- SDK only (minimal deps)pip install queryweaver[server]- Full FastAPI serverpip install queryweaver[dev]- Development toolspip install queryweaver[all]- EverythingTesting Infrastructure
docker-compose.test.yml- FalkorDB + PostgreSQL + MySQL test servicestests/test_sdk/) - 15 passing tests covering:Updated Makefile
Architecture
The SDK uses lazy imports for
api.*modules to allow:from queryweaver_sdk import QueryWeaverwithout FalkorDB runningQueryWeaver()instantiationCore functions in
api/core/text2sql.pyandapi/core/schema_loader.pynow have_syncvariants that return structured dataclasses instead of streaming generators.Requirements
Breaking Changes
None - existing server functionality unchanged.
Testing
Files Changed
New Files
queryweaver_sdk/__init__.py- Package exportsqueryweaver_sdk/client.py- Main QueryWeaver classqueryweaver_sdk/models.py- Result dataclassesqueryweaver_sdk/connection.py- FalkorDB connection wrapperpyproject.toml- Modern Python packagingdocker-compose.test.yml- Test infrastructuretests/test_sdk/- Integration test suiteModified Files
api/core/text2sql.py- Added_syncfunctionsapi/core/schema_loader.py- Addedload_database_sync()Makefile- Added uv support and SDK targets.github/workflows/tests.yml- Added SDK test jobSummary by CodeRabbit