-
Notifications
You must be signed in to change notification settings - Fork 1.7k
SDK: use SQLite in-memory by default #4207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughRehomed local organization setup to forge initializer, added optional DB injection and module-loading to Forge app creation, introduced in-memory SQLite path for embedded server with conditional auth, adjusted API app factory, changed workflow serialization to JSON mode, and refactored a DB query to use a distinct subquery count. Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
a5d3488 to
0de918d
Compare
0de918d to
67afa66
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed everything up to 0de918d in 2 minutes and 41 seconds. Click for details.
- Reviewed
385lines of code in9files - Skipped
0files when reviewing. - Skipped posting
5draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. skyvern/library/embedded_server_factory.py:50
- Draft comment:
Consider avoiding a direct override of the global settings.DATABASE_STRING when enabling in-memory DB. It might be safer to pass this configuration explicitly to AgentDB to avoid side effects in other modules. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% The comment is suggesting a refactor for better code quality - avoiding direct modification of global settings. However, this is somewhat speculative ("might be safer") and the code already follows a pattern of directly modifying settings throughout this function (BROWSER_LOGS_ENABLED, LLM_KEY, and the settings_overrides loop). The comment doesn't point to a definite bug or issue - it's more of a "consider this alternative approach" suggestion. While it could be valid architectural advice, it's not pointing to something that's clearly wrong. The code is consistent with the existing patterns in the same function. Without seeing evidence that this actually causes problems in other modules, this seems like speculative advice rather than a clear issue. The comment might be highlighting a legitimate architectural concern - modifying global state can indeed cause side effects. The fact that other parts of the function also modify settings doesn't necessarily make it good practice. Perhaps this specific DATABASE_STRING modification is more problematic than the others because database connections might be shared across modules. While the architectural concern could be valid, the comment uses tentative language ("might be safer", "Consider") which indicates it's not pointing to a definite issue. The code is internally consistent with the function's existing pattern. Without concrete evidence of actual side effects or problems, this is speculative advice about code quality rather than a clear bug that needs fixing. This comment should be deleted. It's speculative advice ("might be safer", "Consider") about refactoring rather than pointing to a clear bug. The code follows the same pattern already established in the function for modifying settings. Without strong evidence of actual problems, this is not actionable feedback that clearly requires a code change.
2. skyvern/library/embedded_server_factory.py:57
- Draft comment:
Ensure schema creation via Base.metadata.create_all (lines 57-58) meets all test needs; consider adding error handling for potential failures during in-memory DB schema setup. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% The comment has two parts: 1) "Ensure schema creation meets all test needs" - this is asking the author to verify/confirm something, which violates the rules. It's not pointing to a specific deficiency. 2) "consider adding error handling" - the code already usesasync with engine.begin()which provides transaction management and will propagate exceptions. The comment doesn't identify a specific error case that isn't already handled. This seems speculative rather than identifying a definite issue. The comment doesn't provide strong evidence of an actual problem. Could there be specific error cases during schema creation that the context manager doesn't handle properly? Maybe the tool has knowledge of common SQLAlchemy issues that I'm not considering. The in-memory database setup might have edge cases I'm not aware of. While there could theoretically be edge cases, the comment doesn't identify any specific ones. Theasync with engine.begin()pattern is standard SQLAlchemy usage and will propagate exceptions appropriately. The comment is asking the author to "ensure" and "consider" rather than pointing to a concrete issue, which violates the rules about not asking for verification. This comment should be deleted. It asks the author to verify/ensure something works and suggests adding error handling without identifying a specific unhandled error case. The existing code uses standard SQLAlchemy patterns with appropriate context managers.
3. skyvern/library/skyvern.py:204
- Draft comment:
The check for the existence of a .env file (when not using in-memory DB) enforces running 'skyvern quickstart'. Consider providing a clearer fallback or documentation note for users in local mode. - Reason this comment was not posted:
Comment looked like it was already resolved.
4. skyvern/library/skyvern.py:213
- Draft comment:
When not using in-memory DB, the SDK requires an API key from the .env file. It may be helpful to document explicitly that in-memory mode bypasses this requirement. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%<= threshold50%This comment is purely informative, suggesting documentation improvements. It doesn't provide a specific code suggestion or ask for a test to be written. It violates the rule against making purely informative comments.
5. skyvern/library/skyvern.py:129
- Draft comment:
Consider using Optional[bool] instead of 'bool | None' in type hints for clarity and consistency with common Python typing practices. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% This is a style preference comment about type hints. The modern Python typing convention (Python 3.10+) actually prefersX | NoneoverOptional[X]. Looking at the entire file, it consistently uses the|union syntax everywhere - in function parameters, return types, and class attributes. The comment is suggesting a change that would make the code LESS consistent with the rest of the file, not more consistent. The claim about "common Python typing practices" is also questionable since PEP 604 (Python 3.10+) introduced and recommends the|syntax as the preferred way. This comment should be deleted. Perhaps the codebase has a style guide that prefersOptionalover union syntax, or maybe there are older parts of the codebase that useOptional. However, I can only see this one file, and within this file, the union syntax is used consistently everywhere. Even if other parts of the codebase useOptional, this file is internally consistent with using|syntax throughout. The comment is about a single parameter in isolation and doesn't provide evidence that this is inconsistent with the broader codebase. More importantly, this is a minor style preference that doesn't affect functionality, and the rules explicitly state not to make comments about code quality unless they are clearly actionable and important. This comment should be deleted. It suggests a style change that would make the code inconsistent with the rest of the file, which uniformly uses the modernX | Nonesyntax. This is a minor style preference that doesn't warrant a review comment, especially since the suggested change would reduce internal consistency.
Workflow ID: wflow_wRYIVZy0HSGdr5jk
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
67afa66 to
87cceb1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
skyvern/forge/forge_app.py (1)
87-96: InjectedAgentDBsupport is good; prefer explicitis not NonecheckAllowing
create_forge_app(db: AgentDB | None = None)to reuse a preconfiguredAgentDBis exactly what the in‑memory/embedded flow needs, and the rest of the initialization still relies onapp.DATABASEso everything stays coherent.To avoid any surprises if
AgentDBever implements__bool__/__len__, consider tightening the assignment to:- app.DATABASE = db if db else AgentDB(settings.DATABASE_STRING, debug_enabled=settings.DEBUG_MODE) + app.DATABASE = db if db is not None else AgentDB( + settings.DATABASE_STRING, + debug_enabled=settings.DEBUG_MODE, + )This is purely defensive/idiomatic; behavior today is otherwise fine.
skyvern/library/skyvern.py (1)
125-133: Skyvern.local in‑memory flag works but default + error messaging could be tightenedThe new
use_in_memory_dbplumbing anddo_use_in_memory_dbresolution are logically sound:
- If
use_in_memory_dbis True (orSKYVERN_USE_IN_MEMORY_DB=true), the embedded server path can run entirely in‑process with auto schema creation and a locally generated org API key, without requiringSKYVERN_API_KEY.- Regardless of DB mode, you still require either:
- an explicit
llm_config, or- a
.envfile with LLM settings, which is reasonable.A couple of follow‑ups worth considering:
Default behavior vs. PR intent
Right now,
do_use_in_memory_dbdefaults toFalseunlessSKYVERN_USE_IN_MEMORY_DBis set. If the desired developer‑facing behavior is “SDK local mode defaults to in‑memory SQLite unless explicitly overridden”, you may want to invert this so that:
use_in_memory_dbdefault isTrue, orSKYVERN_USE_IN_MEMORY_DBdefaults to"true"in settings, with an explicit opt‑out.Otherwise, callers of
Skyvern.local()who don’t set the env var will still be on the non‑in‑memory path.Ruff hints / style
- Line 195:
# noqa: PLC0415is flagged as unused (RUF100). Since importingcreate_embedded_serverinside the method is already justified, you can likely drop thenoqaand let Ruff be happy, or enable PLC0415 explicitly if needed.- The long
ValueErrormessages around missing.env/LLM config/SKYVERN_API_KEY (lines ~205, 208–211, 214) triggerTRY003. If you want to satisfy that check, you could either:
- shorten the inline messages, or
- introduce a small custom exception class (e.g.,
LocalSetupError) encapsulating the full guidance in its docstring or__str__.These are non‑blocking, but addressing them will keep linters green and align behavior more closely with the “default in‑memory” story.
Also applies to: 190-215, 221-225
skyvern/library/embedded_server_factory.py (2)
31-31: Consider usingfind_dotenv()or a more robust path resolution.The hardcoded
".env"path may not resolve correctly if the working directory differs from the project root. Consider usingfind_dotenv()fromdotenvor making this more robust.- load_dotenv(".env") + load_dotenv(find_dotenv())And update the import:
from dotenv import find_dotenv, load_dotenv
50-64: Remove unusednoqadirectives.The static analysis tool indicates the
# noqa: PLC0415directives on lines 51 and 53 are unused. If these are not needed for other linters in your CI pipeline, they can be removed.- from sqlalchemy.ext.asyncio import create_async_engine # noqa: PLC0415 + from sqlalchemy.ext.asyncio import create_async_engine - from skyvern.forge.sdk.db.models import Base # noqa: PLC0415 + from skyvern.forge.sdk.db.models import Base
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
skyvern/cli/init_command.py(1 hunks)skyvern/cli/mcp.py(0 hunks)skyvern/forge/api_app.py(2 hunks)skyvern/forge/forge_app.py(1 hunks)skyvern/forge/forge_app_initializer.py(2 hunks)skyvern/forge/sdk/db/client.py(2 hunks)skyvern/forge/sdk/workflow/service.py(3 hunks)skyvern/library/embedded_server_factory.py(3 hunks)skyvern/library/skyvern.py(3 hunks)
💤 Files with no reviewable changes (1)
- skyvern/cli/mcp.py
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Useruff checkandruff formatfor linting and formatting Python code
Usemypyfor type checking Python code
Use type hints in Python code
Maintain line length of 120 characters for Python code
**/*.py: Use Python 3.11+ features and type hints in Python code
Follow PEP 8 with a line length of 100 characters in Python code
Use absolute imports for all modules in Python code
Document all public functions and classes with Google-style docstrings in Python code
Usesnake_casefor variables and functions,PascalCasefor classes in Python code
Prefer async/await over callbacks for asynchronous programming in Python
Useasynciofor concurrency in Python asynchronous code
Always handle exceptions in async code in Python
Use context managers for resource cleanup in Python
Use specific exception classes rather than generic exceptions in error handling
Include meaningful error messages in exceptions
Log errors with appropriate severity levels in Python
Never expose sensitive information in error messages
Optimize database queries in Python for performance
Use appropriate data structures in Python for performance optimization
Implement caching where beneficial in Python code
Validate all inputs in Python code
Use environment variables for configuration instead of hardcoding values
Files:
skyvern/forge/sdk/db/client.pyskyvern/forge/api_app.pyskyvern/forge/sdk/workflow/service.pyskyvern/forge/forge_app.pyskyvern/library/skyvern.pyskyvern/library/embedded_server_factory.pyskyvern/forge/forge_app_initializer.pyskyvern/cli/init_command.py
skyvern/forge/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
FastAPI-based REST API and WebSocket support should be located in
skyvern/forge/directory
Files:
skyvern/forge/sdk/db/client.pyskyvern/forge/api_app.pyskyvern/forge/sdk/workflow/service.pyskyvern/forge/forge_app.pyskyvern/forge/forge_app_initializer.py
**/*.{py,ts,tsx,js,jsx}
📄 CodeRabbit inference engine (CLAUDE.md)
Use async/await patterns for asynchronous operations
Files:
skyvern/forge/sdk/db/client.pyskyvern/forge/api_app.pyskyvern/forge/sdk/workflow/service.pyskyvern/forge/forge_app.pyskyvern/library/skyvern.pyskyvern/library/embedded_server_factory.pyskyvern/forge/forge_app_initializer.pyskyvern/cli/init_command.py
🧠 Learnings (4)
📚 Learning: 2025-11-24T17:46:29.802Z
Learnt from: CR
Repo: Skyvern-AI/skyvern PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T17:46:29.802Z
Learning: Applies to skyvern/forge/**/*.py : FastAPI-based REST API and WebSocket support should be located in `skyvern/forge/` directory
Applied to files:
skyvern/forge/api_app.py
📚 Learning: 2025-11-24T17:46:29.802Z
Learnt from: CR
Repo: Skyvern-AI/skyvern PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T17:46:29.802Z
Learning: Configure LLM models via environment variables or `skyvern init llm`, supporting OpenAI, Anthropic, Azure OpenAI, AWS Bedrock, Gemini, and Ollama
Applied to files:
skyvern/library/skyvern.py
📚 Learning: 2025-11-24T17:46:29.802Z
Learnt from: CR
Repo: Skyvern-AI/skyvern PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T17:46:29.802Z
Learning: Run backend service using `skyvern run server` or run both backend and UI using `skyvern run all`
Applied to files:
skyvern/library/skyvern.py
📚 Learning: 2025-11-14T04:27:55.404Z
Learnt from: stanislaw89
Repo: Skyvern-AI/skyvern PR: 3970
File: skyvern/library/embedded_server_factory.py:7-17
Timestamp: 2025-11-14T04:27:55.404Z
Learning: The `create_embedded_server` function in `skyvern/library/embedded_server_factory.py` is an internal factory method not exposed to customers, so it does not require public documentation.
Applied to files:
skyvern/library/embedded_server_factory.py
🧬 Code graph analysis (6)
skyvern/forge/sdk/db/client.py (1)
skyvern/forge/sdk/db/models.py (1)
StepModel(117-145)
skyvern/forge/api_app.py (1)
skyvern/forge/forge_app.py (1)
ForgeApp(42-84)
skyvern/library/skyvern.py (1)
skyvern/library/embedded_server_factory.py (1)
create_embedded_server(19-81)
skyvern/library/embedded_server_factory.py (1)
skyvern/forge/sdk/db/client.py (1)
AgentDB(163-5556)
skyvern/forge/forge_app_initializer.py (6)
skyvern/forge/__init__.py (1)
set_force_app_instance(38-39)skyvern/forge/forge_app.py (2)
ForgeApp(42-84)create_forge_app(87-225)skyvern/forge/sdk/db/client.py (3)
AgentDB(163-5556)get_organization_by_domain(869-873)create_organization(875-895)skyvern/forge/sdk/db/enums.py (1)
OrganizationAuthTokenType(4-8)skyvern/forge/sdk/schemas/organizations.py (1)
Organization(8-21)skyvern/forge/sdk/core/security.py (1)
create_access_token(27-43)
skyvern/cli/init_command.py (2)
skyvern/forge/forge_app_initializer.py (2)
setup_local_organization(59-65)start_forge_app(16-34)skyvern/cli/mcp.py (1)
setup_mcp(263-277)
🪛 Ruff (0.14.8)
skyvern/library/skyvern.py
195-195: Unused noqa directive (non-enabled: PLC0415)
Remove unused noqa directive
(RUF100)
205-205: Avoid specifying long messages outside the exception class
(TRY003)
208-211: Avoid specifying long messages outside the exception class
(TRY003)
214-214: Avoid specifying long messages outside the exception class
(TRY003)
skyvern/library/embedded_server_factory.py
51-51: Unused noqa directive (non-enabled: PLC0415)
Remove unused noqa directive
(RUF100)
53-53: Unused noqa directive (non-enabled: PLC0415)
Remove unused noqa directive
(RUF100)
68-70: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Optimize new Python code in this PR
🔇 Additional comments (7)
skyvern/forge/sdk/db/client.py (2)
6-6: LGTM: Import cleanup.Correctly removed
tuple_import since it's no longer used after the refactoring of the counting logic.
427-460: LGTM: Improved SQL compatibility with subquery approach.The refactored counting logic correctly computes the total unique (task_id, order) pairs by materializing distinct pairs in a subquery before counting. This approach is functionally equivalent to the previous
count(distinct(tuple_(...)))pattern but provides broader database compatibility, particularly for SQLite in-memory databases as noted in the PR objectives.The implementation is clean and maintains all existing filters and error handling.
skyvern/forge/sdk/workflow/service.py (1)
174-227: Consistent use ofmodel_dump(mode="json")for workflow definitions looks correctSwitching
_get_workflow_definition_core_data,create_workflow, andupdate_workflow_definitionto useworkflow_definition.model_dump(mode="json")keeps the representation consistent across:
- what you persist to the DB
- what you later load and compare in
maybe_delete_cached_code.In Pydantic v2 this still returns a plain Python mapping, just with JSON‑serializable values, which should improve compatibility with SQLite and other backends.
Please double‑check that:
- the
workflow_definitioncolumn in your SQL models expects a JSON‑like structure (not a pre‑encoded JSON string), and- there’s no code that depended on Python-native types (e.g.,
datetime) inside the raw stored blob rather than going through theWorkflowDefinitionmodel.If both are true, this change is safe and a nice portability improvement.
Also applies to: 1416-1419, 1718-1731
skyvern/forge/api_app.py (1)
19-21: FastAPI wiring refactor viacreate_fast_api_appis clean and preserves behaviorSplitting app construction into:
start_forge_app()→ ForgeApp initialization (and globalforge_appwiring for lifespan hooks), andcreate_fast_api_app(forge_app_instance)→ FastAPI routes/middleware/handlers,keeps the existing
create_api_app()behavior intact while enabling the embedded server path to inject a preconfiguredForgeApp(e.g., with an in‑memory DB). The conditionalforge_app_instance.setup_api_apphook also remains in the right place.No issues from a correctness or lifecycle perspective.
Also applies to: 77-84, 86-161
skyvern/cli/init_command.py (1)
11-12: Local init correctly initializes Forge app before creating the local organization API keyThe
run_localbranch now:
- Runs migrations,
- Calls
start_forge_app()to initializeForgeAppandapp.DATABASE,- Then uses
setup_local_organization()to obtain an API key.That ordering is correct for the new helper location in
forge_app_initializerand matches what the embedded server path expects. The rest of the init flow (LLM setup, browser config, MCP setup) is unaffected.Looks good.
Also applies to: 19-20, 47-53
skyvern/library/embedded_server_factory.py (2)
65-71: LGTM!The fallback path correctly validates the API key presence and provides a clear, actionable error message. The conditional header injection on lines 75-76 is good defensive coding.
19-23: LGTM on the new parameter addition.The
use_in_memory_dbparameter with aFalsedefault maintains backward compatibility while enabling the new local/embedded mode. Type hints are complete.
| from skyvern.forge import app, set_force_app_instance | ||
| from skyvern.forge.forge_app import ForgeApp, create_forge_app | ||
| from skyvern.forge.sdk.core import security | ||
| from skyvern.forge.sdk.db.client import AgentDB | ||
| from skyvern.forge.sdk.db.enums import OrganizationAuthTokenType | ||
| from skyvern.forge.sdk.schemas.organizations import Organization | ||
| from skyvern.forge.sdk.services.local_org_auth_token_service import SKYVERN_LOCAL_DOMAIN, SKYVERN_LOCAL_ORG | ||
| from skyvern.forge.sdk.services.org_auth_token_service import API_KEY_LIFETIME | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regenerate a local org API token when none is valid; ForgeApp/db injection looks good otherwise
The changes here nicely:
- Allow
start_forge_app(db: AgentDB | None = None)to pass an injectedAgentDBintocreate_forge_app, which is exactly what the in‑memory SQLite path needs. - Load
settings.ADDITIONAL_MODULESand call each module’sconfigure_app(ForgeApp)hook, which is a reasonable extension point. - Introduce
get_or_create_local_organization()andsetup_local_organization()so both the CLI and embedded server can bootstrap a local org + API key.
One edge case is problematic for long‑lived or previously used local installs:
- If a local organization already exists but no valid API token is present (e.g., tokens expired per
API_KEY_LIFETIMEor were manually cleaned up),setup_local_organization()will:- call
get_or_create_local_organization()(which does not create a new token in this case), then - call
get_valid_org_auth_token(...)and return""when nothing is valid.
- call
In that scenario, both:
skyvern library’s embedded server (which expects a usable API key), andskyvern init(which wants to writeSKYVERN_API_KEYto.env)
will silently fail to obtain a working key and cannot self‑heal.
Consider regenerating a token when none is valid, e.g.:
async def setup_local_organization() -> str:
organization = await get_or_create_local_organization()
- org_auth_token = await app.DATABASE.get_valid_org_auth_token(
- organization_id=organization.organization_id,
- token_type=OrganizationAuthTokenType.api.value,
- )
- return org_auth_token.token if org_auth_token else ""
+ org_auth_token = await app.DATABASE.get_valid_org_auth_token(
+ organization_id=organization.organization_id,
+ token_type=OrganizationAuthTokenType.api.value,
+ )
+ if org_auth_token:
+ return org_auth_token.token
+
+ # No valid token found — create a fresh one for this local org
+ api_key = security.create_access_token(
+ organization.organization_id,
+ expires_delta=API_KEY_LIFETIME,
+ )
+ await app.DATABASE.create_org_auth_token(
+ organization_id=organization.organization_id,
+ token=api_key,
+ token_type=OrganizationAuthTokenType.api,
+ )
+ return api_keyThat keeps the initial‑creation logic in get_or_create_local_organization() but makes setup_local_organization() robust for re‑runs and expired tokens.
Aside from this edge case, the ForgeApp/db injection and module loading logic look solid.
Also applies to: 16-35, 37-57, 59-65
🤖 Prompt for AI Agents
In skyvern/forge/forge_app_initializer.py around lines 4-12,
setup_local_organization may return an empty string when the local organization
exists but no valid API token is found; update setup_local_organization so that
when get_valid_org_auth_token(...) returns no token it explicitly generates a
new local API token (using the org auth token service / create_org_auth_token
with OrganizationAuthTokenType.LOCAL and appropriate lifetime), persists it for
the org, and returns that token; ensure any errors are raised/logged and that
this regeneration path is taken only for the local org to avoid creating
duplicate tokens for other orgs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important
Looks good to me! 👍
Reviewed 87cceb1 in 2 minutes and 9 seconds. Click for details.
- Reviewed
358lines of code in9files - Skipped
0files when reviewing. - Skipped posting
4draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. skyvern/library/embedded_server_factory.py:50
- Draft comment:
Modifying the global settings.DATABASE_STRING for an in‐memory DB may lead to unintended side effects. Consider using a local copy of the configuration instead. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% The comment is speculative ("may lead to unintended side effects") rather than pointing to a definite bug. It suggests an alternative approach but doesn't demonstrate actual harm. The code already modifies global settings in multiple places (BROWSER_LOGS_ENABLED, LLM_KEY, and via settings_overrides), so modifying DATABASE_STRING follows the same pattern. The comment violates the rule about not making speculative comments ("If X, then Y is an issue" - only comment if it's definitely an issue). Without evidence of actual side effects or a clear bug, this is more of a code quality suggestion that may not be actionable without broader refactoring of the entire settings approach. However, the comment could be valid if modifying DATABASE_STRING has specific side effects that I can't see without more context. Perhaps other parts of the codebase read this setting and would be affected. The suggestion to use a local copy might be genuinely important for preventing issues. While there could be side effects I can't see, the comment is still speculative ("may lead to") and doesn't provide evidence of actual problems. The entire function already modifies global settings extensively, and if that's a problem, it would require a broader refactoring beyond just DATABASE_STRING. The comment doesn't meet the bar of "STRONG EVIDENCE that the comment is correct." This comment should be deleted. It's speculative rather than pointing to a definite issue, and it's inconsistent with the existing pattern in the code where multiple global settings are modified. Without concrete evidence of side effects, this is just a stylistic suggestion that isn't actionable.
2. skyvern/library/embedded_server_factory.py:31
- Draft comment:
load_dotenv('.env') is called during the first request initialization. Consider moving this to an initialization step to avoid re-loading the .env file on subsequent requests. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% The comment appears to be incorrect. Theload_dotenv(".env")call is inside anif self._transport is None:block at line 30. This is a lazy initialization pattern - the entire block only executes once on the first request. After the first request,self._transportis set (line 73), so subsequent requests will skip the entire initialization block including theload_dotenvcall. The comment's concern about "re-loading the.envfile on subsequent requests" is unfounded because the code already prevents this. The comment is based on a misunderstanding of the code structure. Could there be a scenario where_transportgets reset to None, causing re-initialization? Or could there be multiple instances ofEmbeddedServerTransportcreated, each callingload_dotenvonce? While multiple instances could theoretically be created, that would be by design (callingcreate_embedded_servermultiple times), and each would need its own initialization. The comment specifically mentions "subsequent requests" which implies requests on the same instance, and the code clearly prevents re-initialization on subsequent requests to the same instance via theif self._transport is None:guard. The comment is incorrect. Theload_dotenvcall is already protected by theif self._transport is None:guard, which ensures it only runs once on the first request. Subsequent requests skip this entire initialization block, so the.envfile is not re-loaded on subsequent requests.
3. skyvern/library/skyvern.py:199
- Draft comment:
Multiple checks for the existence of the '.env' file are performed. Consider caching the result of os.path.exists('.env') in a variable for improved clarity and efficiency. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 15% vs. threshold = 50% This is a code quality suggestion about cachingos.path.exists('.env'). While technically correct that the check is performed twice, I need to consider: 1) Is this a significant performance issue? File existence checks are very fast operations. 2) Would caching actually improve clarity? The current code is straightforward and readable. 3) Is this an important enough issue to warrant a comment? The calls are only 3 lines apart, so the performance impact is negligible. 4) The comment says "multiple checks" but there are only 2 checks in close proximity. This seems like a minor optimization that doesn't significantly impact code quality or performance. While the suggestion is technically valid,os.path.exists()is an extremely fast operation, and calling it twice in succession has negligible performance impact. The clarity argument is also weak - introducing a variable likeenv_existsmight not actually improve readability since each check has different logical context (one is about in-memory DB, one is about llm_config). This could be considered a premature optimization. The comment falls into the category of minor code quality suggestions that are not clearly actionable or important. The performance gain would be immeasurable, and the clarity improvement is debatable. This type of micro-optimization comment is not valuable enough to keep, especially since it's not addressing a clear problem with the changed code. This comment should be deleted. It suggests a micro-optimization that provides negligible performance benefit and questionable clarity improvement. The suggestion is too minor to be actionable and doesn't meet the threshold for a valuable code review comment.
4. skyvern/library/skyvern.py:287
- Draft comment:
The use of 'asyncio.timeout' requires Python 3.11 or later. Ensure this requirement is documented or provide a fallback for earlier Python versions if needed. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
Workflow ID: wflow_Wl58bods2Ufm1zgg
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
skyvern/forge/forge_app_initializer.py (1)
59-65: Regenerate a local org API token when none is valid.When a local organization exists but no valid API token is present (e.g., tokens expired or manually cleaned),
setup_local_organization()returns"", causing both the embedded server and CLI initialization to fail silently. The function should regenerate a token in this scenario to self-heal for long-lived or previously used local installations.Consider applying the fix suggested in the previous review to regenerate tokens when none are valid.
🧹 Nitpick comments (2)
skyvern/library/skyvern.py (1)
195-195: Remove unusednoqadirective.The import statement has an unused
noqa: PLC0415directive that should be removed.Apply this diff:
- from skyvern.library.embedded_server_factory import create_embedded_server # noqa: PLC0415 + from skyvern.library.embedded_server_factory import create_embedded_serverskyvern/library/embedded_server_factory.py (1)
51-53: Remove unusednoqadirectives.The import statements have unused
noqa: PLC0415directives that should be removed.Apply this diff:
- from sqlalchemy.ext.asyncio import create_async_engine # noqa: PLC0415 + from sqlalchemy.ext.asyncio import create_async_engine - from skyvern.forge.sdk.db.models import Base # noqa: PLC0415 + from skyvern.forge.sdk.db.models import Base
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
skyvern/cli/init_command.py(1 hunks)skyvern/cli/mcp.py(0 hunks)skyvern/forge/api_app.py(2 hunks)skyvern/forge/forge_app.py(1 hunks)skyvern/forge/forge_app_initializer.py(2 hunks)skyvern/forge/sdk/db/client.py(2 hunks)skyvern/forge/sdk/workflow/service.py(3 hunks)skyvern/library/embedded_server_factory.py(3 hunks)skyvern/library/skyvern.py(3 hunks)
💤 Files with no reviewable changes (1)
- skyvern/cli/mcp.py
🚧 Files skipped from review as they are similar to previous changes (2)
- skyvern/forge/sdk/workflow/service.py
- skyvern/forge/forge_app.py
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Useruff checkandruff formatfor linting and formatting Python code
Usemypyfor type checking Python code
Use type hints in Python code
Maintain line length of 120 characters for Python code
**/*.py: Use Python 3.11+ features and type hints in Python code
Follow PEP 8 with a line length of 100 characters in Python code
Use absolute imports for all modules in Python code
Document all public functions and classes with Google-style docstrings in Python code
Usesnake_casefor variables and functions,PascalCasefor classes in Python code
Prefer async/await over callbacks for asynchronous programming in Python
Useasynciofor concurrency in Python asynchronous code
Always handle exceptions in async code in Python
Use context managers for resource cleanup in Python
Use specific exception classes rather than generic exceptions in error handling
Include meaningful error messages in exceptions
Log errors with appropriate severity levels in Python
Never expose sensitive information in error messages
Optimize database queries in Python for performance
Use appropriate data structures in Python for performance optimization
Implement caching where beneficial in Python code
Validate all inputs in Python code
Use environment variables for configuration instead of hardcoding values
Files:
skyvern/library/embedded_server_factory.pyskyvern/forge/api_app.pyskyvern/library/skyvern.pyskyvern/forge/sdk/db/client.pyskyvern/cli/init_command.pyskyvern/forge/forge_app_initializer.py
**/*.{py,ts,tsx,js,jsx}
📄 CodeRabbit inference engine (CLAUDE.md)
Use async/await patterns for asynchronous operations
Files:
skyvern/library/embedded_server_factory.pyskyvern/forge/api_app.pyskyvern/library/skyvern.pyskyvern/forge/sdk/db/client.pyskyvern/cli/init_command.pyskyvern/forge/forge_app_initializer.py
skyvern/forge/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
FastAPI-based REST API and WebSocket support should be located in
skyvern/forge/directory
Files:
skyvern/forge/api_app.pyskyvern/forge/sdk/db/client.pyskyvern/forge/forge_app_initializer.py
🧠 Learnings (4)
📚 Learning: 2025-11-14T04:27:55.404Z
Learnt from: stanislaw89
Repo: Skyvern-AI/skyvern PR: 3970
File: skyvern/library/embedded_server_factory.py:7-17
Timestamp: 2025-11-14T04:27:55.404Z
Learning: The `create_embedded_server` function in `skyvern/library/embedded_server_factory.py` is an internal factory method not exposed to customers, so it does not require public documentation.
Applied to files:
skyvern/library/embedded_server_factory.py
📚 Learning: 2025-11-24T17:46:29.802Z
Learnt from: CR
Repo: Skyvern-AI/skyvern PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T17:46:29.802Z
Learning: Applies to skyvern/forge/**/*.py : FastAPI-based REST API and WebSocket support should be located in `skyvern/forge/` directory
Applied to files:
skyvern/forge/api_app.py
📚 Learning: 2025-11-24T17:46:29.802Z
Learnt from: CR
Repo: Skyvern-AI/skyvern PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T17:46:29.802Z
Learning: Configure LLM models via environment variables or `skyvern init llm`, supporting OpenAI, Anthropic, Azure OpenAI, AWS Bedrock, Gemini, and Ollama
Applied to files:
skyvern/library/skyvern.py
📚 Learning: 2025-11-24T17:46:29.802Z
Learnt from: CR
Repo: Skyvern-AI/skyvern PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T17:46:29.802Z
Learning: Run backend service using `skyvern run server` or run both backend and UI using `skyvern run all`
Applied to files:
skyvern/library/skyvern.py
🧬 Code graph analysis (5)
skyvern/library/embedded_server_factory.py (3)
skyvern/forge/api_app.py (2)
create_api_app(77-83)create_fast_api_app(86-161)skyvern/forge/forge_app_initializer.py (2)
setup_local_organization(59-65)start_forge_app(16-34)skyvern/forge/sdk/db/client.py (1)
AgentDB(163-5556)
skyvern/forge/api_app.py (1)
skyvern/forge/forge_app.py (1)
ForgeApp(42-84)
skyvern/library/skyvern.py (1)
skyvern/library/embedded_server_factory.py (1)
create_embedded_server(19-81)
skyvern/forge/sdk/db/client.py (1)
skyvern/forge/sdk/db/models.py (1)
StepModel(117-145)
skyvern/forge/forge_app_initializer.py (5)
skyvern/forge/forge_app.py (2)
ForgeApp(42-84)create_forge_app(87-225)skyvern/forge/sdk/db/client.py (7)
AgentDB(163-5556)get_organization_by_domain(869-873)create_organization(875-895)create_org_auth_token(1025-1059)get_valid_org_auth_token(924-928)get_valid_org_auth_token(931-935)get_valid_org_auth_token(937-963)skyvern/forge/sdk/db/enums.py (1)
OrganizationAuthTokenType(4-8)skyvern/forge/sdk/schemas/organizations.py (1)
Organization(8-21)skyvern/forge/sdk/core/security.py (1)
create_access_token(27-43)
🪛 Ruff (0.14.8)
skyvern/library/embedded_server_factory.py
51-51: Unused noqa directive (non-enabled: PLC0415)
Remove unused noqa directive
(RUF100)
53-53: Unused noqa directive (non-enabled: PLC0415)
Remove unused noqa directive
(RUF100)
68-70: Avoid specifying long messages outside the exception class
(TRY003)
skyvern/library/skyvern.py
195-195: Unused noqa directive (non-enabled: PLC0415)
Remove unused noqa directive
(RUF100)
205-205: Avoid specifying long messages outside the exception class
(TRY003)
208-211: Avoid specifying long messages outside the exception class
(TRY003)
214-214: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Optimize new Python code in this PR
🔇 Additional comments (6)
skyvern/forge/sdk/db/client.py (2)
6-6: LGTM - Import cleanup.Correctly removes the unused
tuple_import after the query refactor.
440-454: LGTM - Good refactor for cross-database compatibility.The subquery approach correctly replaces the PostgreSQL-specific
count(distinct(tuple_(...)))pattern. The new implementation:
- Creates a subquery with distinct
(task_id, order)pairs- Counts rows from that subquery
This is semantically equivalent and works with both PostgreSQL and SQLite, enabling the SDK's in-memory database mode.
skyvern/forge/api_app.py (1)
19-19: LGTM! Clean refactoring enables ForgeApp dependency injection.The extraction of
create_fast_api_app(forge_app_instance: ForgeApp)allows the embedded server to inject a custom ForgeApp instance (e.g., with an in-memory database), while preserving the existingcreate_api_app()behavior. This is a solid pattern for supporting multiple initialization paths.Also applies to: 82-83, 86-161
skyvern/cli/init_command.py (1)
11-11: LGTM! Import relocation aligns with refactoring.The import updates correctly reflect the relocation of
setup_local_organizationtoforge_app_initializer.py, consolidating local organization provisioning logic in a central module.Also applies to: 19-19
skyvern/library/embedded_server_factory.py (1)
50-71: LGTM! In-memory DB initialization path is well-structured.The conditional logic cleanly separates the in-memory DB workflow (lines 50-64) from the standard mode (lines 65-71). The in-memory path properly:
- Creates the SQLite in-memory engine
- Runs schema creation synchronously via
Base.metadata.create_all- Initializes AgentDB with the engine
- Sets up local organization and obtains API key
The standard mode correctly validates
SKYVERN_API_KEYpresence with a clear error message.skyvern/forge/forge_app_initializer.py (1)
16-34: LGTM! ForgeApp initialization supports DB injection and module loading.The updated
start_forge_app(db: AgentDB | None = None)signature enables:
- Passing a pre-configured
AgentDBinstance (e.g., in-memory SQLite) tocreate_forge_app- Loading and configuring additional modules via
settings.ADDITIONAL_MODULESThis provides the flexibility needed for embedded server scenarios while maintaining backward compatibility.
| if not llm_config and not os.path.exists(".env"): | ||
| raise ValueError( | ||
| "LLM configuration is required. Either provide llm_config parameter or " | ||
| "run `skyvern quickstart` to configure LLM settings in .env file." | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gate LLM config validation by in-memory flag for consistency.
The LLM configuration validation should be conditional on do_use_in_memory_db, similar to the .env file check on line 204 and the API key check on line 213. Currently, when using in-memory mode without providing llm_config, this check will incorrectly require a .env file even though in-memory mode should work without one.
Apply this diff to align the validation logic:
- if not llm_config and not os.path.exists(".env"):
+ if not do_use_in_memory_db and not llm_config and not os.path.exists(".env"):
raise ValueError(
"LLM configuration is required. Either provide llm_config parameter or "
"run `skyvern quickstart` to configure LLM settings in .env file."
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if not llm_config and not os.path.exists(".env"): | |
| raise ValueError( | |
| "LLM configuration is required. Either provide llm_config parameter or " | |
| "run `skyvern quickstart` to configure LLM settings in .env file." | |
| ) | |
| if not do_use_in_memory_db and not llm_config and not os.path.exists(".env"): | |
| raise ValueError( | |
| "LLM configuration is required. Either provide llm_config parameter or " | |
| "run `skyvern quickstart` to configure LLM settings in .env file." | |
| ) |
🧰 Tools
🪛 Ruff (0.14.8)
208-211: Avoid specifying long messages outside the exception class
(TRY003)
🤖 Prompt for AI Agents
In skyvern/library/skyvern.py around lines 207 to 211, the LLM config missing
check is too strict for in-memory mode; change the condition so the ValueError
is raised only when not using the in-memory DB. Specifically, require llm_config
or a .env file only if do_use_in_memory_db is False (i.e., make the if
conditional include "and not do_use_in_memory_db"), mirroring the surrounding
.env and API key checks, so in-memory operation does not demand .env or
llm_config.
🚀 This PR refactors the Skyvern SDK to use SQLite in-memory databases by default and introduces a cleaner API with the
Skyvern.local()factory method, making local development and testing significantly easier by eliminating the need for external database setup.🔍 Detailed Analysis
Key Changes
Skyvern.local()factory method for local/embedded mode and madeapi_keyrequired for remote modedistinct(tuple_())with subquery approach for better database supportapi_keyparameters orSkyvern.local()Skyvern.local()for embedded scenariosTechnical Implementation
flowchart TD A[Skyvern Constructor] --> B{Mode Selection} B -->|api_key provided| C[Remote Mode] B -->|Skyvern.local()| D[Local Mode] C --> E[Connect to Cloud/Self-hosted] D --> F{use_in_memory_db?} F -->|true| G[Create SQLite In-Memory DB] F -->|false| H[Use .env Database Config] G --> I[Auto-create Schema] I --> J[Setup Local Organization] J --> K[Generate API Key] H --> L[Load from .env] K --> M[Embedded Server Ready] L --> M E --> N[Remote Client Ready]Impact
Skyvern.local()) and remote (Skyvern(api_key=...)) usage patternsCreated with Palmier
Important
Refactors Skyvern SDK to use SQLite in-memory by default, introduces
Skyvern.local(), and updates integrations and documentation.Skyvern.local()for local mode, requiringapi_keyfor remote mode.create_embedded_server()inembedded_server_factory.pyto support in-memory databases.distinct(tuple_())with subquery approach inclient.pyfor better SQL compatibility.Skyvern.local()and new API patterns.Skyvern.local().use_in_memory_dbparameter toSkyvern.local()inskyvern.py.setup_local_organization()toforge_app_initializer.py.This description was created by
for 87cceb1. You can customize this summary. It will automatically update as commits are pushed.
Summary by CodeRabbit
New Features
Refactor
Performance
✏️ Tip: You can customize this high-level summary in your review settings.