Skip to content

Conversation

gswangg
Copy link

@gswangg gswangg commented Oct 5, 2025

Adds a dev-only SelfHealApiKeyBanner that monitors the new /internal/auth/status endpoint and, when the key is missing/invalid/expired/misconfigured, surfaces the error + a single-click repair button that mints a new local key, rewrites both .env files, and hides itself on success.


🔧 This PR introduces a self-healing API key system for Skyvern's local development environment that automatically detects and repairs authentication issues. The implementation adds a diagnostic banner in the frontend that can detect various API key problems (missing, expired, invalid format, etc.) and provides a one-click repair mechanism that regenerates the key and updates environment files automatically.

🔍 Detailed Analysis

Key Changes

  • Frontend Banner Component: Added SelfHealApiKeyBanner that displays authentication status and provides repair functionality in development mode only
  • Authentication Diagnostics: New useAuthDiagnostics hook and /internal/auth/status endpoint to check API key validity with detailed error categorization
  • Self-Healing Mechanism: /internal/auth/repair endpoint that regenerates API keys and automatically updates both root and frontend .env files
  • Security Enhancements: Local-only access restrictions using IP validation and environment checks
  • Service Layer Refactoring: Extracted token validation logic into reusable resolve_org_from_api_key function

Technical Implementation

sequenceDiagram
    participant UI as Frontend UI
    participant Hook as useAuthDiagnostics
    participant API as Internal Auth API
    participant Service as Local Org Service
    participant DB as Database
    participant FS as File System

    UI->>Hook: Check auth status (dev mode only)
    Hook->>API: GET /internal/auth/status
    API->>API: Validate localhost access
    API->>Service: resolve_org_from_api_key()
    Service->>DB: Validate token & organization
    DB-->>Service: Validation result
    Service-->>API: AuthStatus (ok/missing_env/invalid/expired/etc.)
    API-->>Hook: Diagnostics response
    Hook-->>UI: Display banner if issues found
    
    Note over UI: User clicks "Regenerate API key"
    
    UI->>API: POST /internal/auth/repair
    API->>Service: regenerate_local_api_key()
    Service->>DB: Invalidate old tokens
    Service->>DB: Create new API key
    Service->>FS: Update .env files
    Service-->>API: New key + fingerprint
    API-->>UI: Success response
    UI->>Hook: Refetch diagnostics
Loading

Impact

  • Developer Experience: Eliminates manual API key management during local development by providing automatic detection and repair of authentication issues
  • Error Prevention: Comprehensive diagnostics help developers quickly identify and resolve various API key problems (missing, expired, invalid format, organization not found)
  • Security: Restricts internal endpoints to localhost access only and limits functionality to local development environment
  • Maintainability: Centralizes API key validation logic and provides consistent error handling across the application

Created with Palmier


Important

Add self-healing mechanism for Skyvern API key with frontend diagnostics and backend regeneration support for local development.

  • Frontend:
    • Add SelfHealApiKeyBanner component in App.tsx to display API key status and regeneration option.
    • Implement SelfHealApiKeyBanner in SelfHealApiKeyBanner.tsx to handle various API key issues and provide a regeneration button.
    • Use useAuthDiagnostics hook in useAuthDiagnostics.ts to fetch API key status.
  • Backend:
    • Add internal_auth.py to handle API key diagnostics and regeneration endpoints.
    • Implement regenerate_local_api_key() in local_org_auth_token_service.py to create and store a new API key.
    • Modify api_app.py to include internal_auth router for local development.
  • Services:
    • Add resolve_org_from_api_key() in org_auth_service.py to validate API keys.
  • Tests:
    • Add tests in test_internal_auth.py to verify local request handling.

This description was created by Ellipsis for 8a8240e. You can customize this summary. It will automatically update as commits are pushed.

Copy link
Contributor

coderabbitai bot commented Oct 5, 2025

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 18.52% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title succinctly describes the primary feature added by this PR, namely self-healing of the Skyvern API key, and follows a concise conventional commit style without unnecessary detail.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Changes requested ❌

Reviewed everything up to 8a8240e in 2 minutes and 25 seconds. Click for details.
  • Reviewed 619 lines of code in 9 files
  • Skipped 0 files when reviewing.
  • Skipped posting 9 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. skyvern-frontend/src/App.tsx:29
  • Draft comment:
    Avoid hardcoding sensitive keys such as the PostHog API key; use environment variables for configuration.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
2. skyvern-frontend/src/components/SelfHealApiKeyBanner.tsx:75
  • Draft comment:
    The conditional block returning null when no status, statusMessage, or errorMessage is present may be overly redundant. Consider refining the loading state handling (e.g., showing a spinner) or consolidating the conditions for clarity.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% The code could be simplified by combining the conditions, but the current structure is clear and follows a logical flow: first check if there's nothing to show, then check if we're still loading. While a loading spinner could be added, that's more of a feature request than a clear code issue. The current behavior of showing nothing while loading seems intentional. The comment does identify a real opportunity for code simplification. The suggestion about loading spinners could be valuable for UX. However, the current code is not problematic - it's readable and works as intended. The loading spinner suggestion is more of a feature request than a code quality issue. The comment should be deleted as it doesn't identify a clear issue that needs fixing - the code works fine as is and the suggestions are more about optional enhancements.
3. skyvern/forge/api_app.py:74
  • Draft comment:
    Internal dev endpoints are conditionally included for local environments. Verify that these routes remain inaccessible outside local mode to avoid exposure of internal repair and status diagnostics.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the PR author to verify that certain routes remain inaccessible outside of local mode. This falls under the category of asking the author to ensure behavior is intended, which is against the rules. Therefore, this comment should be removed.
4. skyvern/forge/sdk/routes/internal_auth.py:44
  • Draft comment:
    Internal authentication routes correctly enforce local access via _require_local_access. Ensure the IP address validation logic (_is_local_request) continues to cover all expected scenarios (loopback and private ranges).
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is asking the PR author to ensure that the IP address validation logic continues to cover all expected scenarios. This falls under the category of asking the author to double-check or ensure behavior, which is against the rules.
5. skyvern/forge/sdk/services/local_org_auth_token_service.py:21
  • Draft comment:
    Consider adding error handling around file operations in _write_env to gracefully handle potential I/O errors (e.g., permission issues or read-only file systems).
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 30% vs. threshold = 50% The suggestion is valid from a defensive programming perspective - file operations can fail. However, this is a local development utility (note the "local" in filename and SKYVERN_LOCAL_ORG constant). For local dev tools, extensive error handling may be overkill since errors will be immediately obvious to the developer. The current logging also helps with debugging. I might be underestimating the impact of file operation failures - even in local dev, graceful error handling could save developer time debugging issues. While true, the simplicity of the current implementation makes issues very transparent, and adding error handling would add complexity for limited benefit in a dev-only context. The comment should be removed as it suggests adding complexity that isn't clearly justified for a local development utility.
6. skyvern/library/skyvern.py:61
  • Draft comment:
    Calling migrate_db() during init may lead to unintended side effects if the constructor is invoked multiple times. Consider separating migration logic from initialization.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
7. skyvern/library/skyvern.py:191
  • Draft comment:
    Using asyncio.create_task for background tasks (e.g., in create_task_v1) may fail silently if errors occur. Consider attaching error handling or logging to these background tasks.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
8. tests/unit_tests/test_internal_auth.py:1
  • Draft comment:
    Test cases for _is_local_request cover various scenarios (public IP, loopback, private IP, and missing client). The tests look clear and complete.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% This comment is purely informative, as it only states that the test cases cover various scenarios and that they look clear and complete. It does not provide any actionable feedback or suggestions for improvement.
9. skyvern-frontend/src/App.tsx:34
  • Draft comment:
    Typo suggestion: Consider using 'SelfHealingApiKeyBanner' instead of 'SelfHealApiKeyBanner' for consistency with the PR title 'self healing skyvern api key'.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% This is a purely stylistic naming suggestion. The current name is clear and understandable. The suggestion is based only on matching PR title wording, which is not a strong technical reason. Component naming is a minor issue that doesn't affect functionality. This feels like unnecessary nitpicking. Maybe consistent naming across PR title and code is important for maintainability? The current name could be seen as slightly less descriptive. PR titles are temporary and often use different wording than code. The current component name is clear and follows standard React naming conventions. This comment should be deleted as it's a minor stylistic suggestion based only on PR title matching, not on any technical merit or code quality concerns.

Workflow ID: wflow_zfIvtrAm8m26zYzq

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

)


@cached(cache=TTLCache(maxsize=CACHE_SIZE, ttl=AUTHENTICATION_TTL))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caching API key validation for one hour (TTL) may allow a revoked key to remain active longer than desired. Consider reducing the TTL or implementing a way to invalidate the cache on key changes.

Suggested change
@cached(cache=TTLCache(maxsize=CACHE_SIZE, ttl=AUTHENTICATION_TTL))
@cached(cache=TTLCache(maxsize=CACHE_SIZE, ttl=300))

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (6)
skyvern-frontend/src/components/SelfHealApiKeyBanner.tsx (3)

13-53: Consider extracting getCopy to a separate constants file.

While the implementation is correct, this helper function could be extracted to a constants or config file for better maintainability and testability.

Apply this refactor:

Create a new file skyvern-frontend/src/constants/authDiagnostics.ts:

export type BannerStatus = Exclude<AuthStatusValue, "ok"> | "error";

export function getCopy(status: BannerStatus): { title: string; description: string } {
  // ... existing implementation
}

Then import in the banner component:

+import { BannerStatus, getCopy } from "@/constants/authDiagnostics";
-type BannerStatus = Exclude<AuthStatusValue, "ok"> | "error";
-
-function getCopy(status: BannerStatus): { title: string; description: string } {
-  // ... remove implementation
-}

66-80: Simplify the early return logic.

Lines 75-79 can be simplified—both branches return null, so they can be collapsed.

Apply this diff:

-  if (!bannerStatus && !statusMessage && !errorMessage) {
-    if (isLoading) {
-      return null;
-    }
-    return null;
-  }
+  if (!bannerStatus && !statusMessage && !errorMessage) {
+    return null;
+  }

85-110: Differentiate error handling and add retry logic in handleRepair.

  • Global retries are disabled (retry: false in QueryClient.ts), so handleRepair never retries on network failures.
  • Introduce retry policies (via react-query’s retry option or axios interceptors) and display distinct messages for timeouts vs. server errors.
tests/unit_tests/test_internal_auth.py (2)

6-15: Consider adding test_query_string and server fields to the scope.

While the minimal scope works for _is_local_request, adding query_string (empty bytes) and server fields would make the mock more realistic and prevent potential issues if the tested function is extended later.

Apply this diff:

 def _make_request(host: str | None) -> Request:
     scope = {
         "type": "http",
         "client": (host, 12345) if host else None,
         "headers": [],
         "method": "GET",
         "path": "/",
         "scheme": "http",
+        "query_string": b"",
+        "server": ("localhost", 8000),
     }
     return Request(scope)

18-35: Consider adding IPv6 test cases.

The current tests cover IPv4 addresses. Consider adding test cases for IPv6 loopback (::1) and private addresses to ensure complete coverage.

Add these test cases:

def test_is_local_request_accepts_ipv6_loopback() -> None:
    request = _make_request("::1")
    assert _is_local_request(request) is True


def test_is_local_request_accepts_ipv6_private() -> None:
    request = _make_request("fd00::1")  # IPv6 private address
    assert _is_local_request(request) is True
skyvern-frontend/src/hooks/useAuthDiagnostics.ts (1)

21-27: Consider adding error handling to fetchDiagnostics.

The fetchDiagnostics function doesn't explicitly handle errors. While React Query will catch thrown errors, consider adding explicit error handling or logging for better debugging.

Apply this diff:

 async function fetchDiagnostics(): Promise<AuthDiagnosticsResponse> {
-  const client = await getClient(null);
-  const response = await client.get<AuthDiagnosticsResponse>(
-    "/internal/auth/status",
-  );
-  return response.data;
+  try {
+    const client = await getClient(null);
+    const response = await client.get<AuthDiagnosticsResponse>(
+      "/internal/auth/status",
+    );
+    return response.data;
+  } catch (error) {
+    console.error("Failed to fetch auth diagnostics:", error);
+    throw error;
+  }
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e729681 and 8a8240e.

📒 Files selected for processing (9)
  • skyvern-frontend/src/App.tsx (2 hunks)
  • skyvern-frontend/src/components/SelfHealApiKeyBanner.tsx (1 hunks)
  • skyvern-frontend/src/hooks/useAuthDiagnostics.ts (1 hunks)
  • skyvern/forge/api_app.py (2 hunks)
  • skyvern/forge/sdk/routes/internal_auth.py (1 hunks)
  • skyvern/forge/sdk/services/local_org_auth_token_service.py (1 hunks)
  • skyvern/forge/sdk/services/org_auth_service.py (5 hunks)
  • skyvern/library/skyvern.py (2 hunks)
  • tests/unit_tests/test_internal_auth.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (6)
skyvern-frontend/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

skyvern-frontend/**/*.{ts,tsx}: TypeScript in skyvern-frontend must be linted with ESLint and formatted with Prettier
Enforce a maximum line length of 120 characters in frontend TypeScript files

Files:

  • skyvern-frontend/src/components/SelfHealApiKeyBanner.tsx
  • skyvern-frontend/src/App.tsx
  • skyvern-frontend/src/hooks/useAuthDiagnostics.ts
{skyvern,integrations,alembic,scripts}/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

{skyvern,integrations,alembic,scripts}/**/*.py: Use Python 3.11+ features and add type hints throughout the codebase
Follow PEP 8 with a maximum line length of 100 characters
Use absolute imports for all Python modules
Document all public functions and classes with Google-style docstrings
Use snake_case for variables and functions, and PascalCase for classes
Prefer async/await over callbacks in asynchronous code
Use asyncio for concurrency
Always handle exceptions in async code
Use context managers for resource cleanup
Use specific exception classes
Include meaningful error messages when raising or logging exceptions
Log errors with appropriate severity levels
Never expose sensitive information in error messages

Files:

  • skyvern/forge/api_app.py
  • skyvern/forge/sdk/services/local_org_auth_token_service.py
  • skyvern/forge/sdk/routes/internal_auth.py
  • skyvern/library/skyvern.py
  • skyvern/forge/sdk/services/org_auth_service.py
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Python code must be linted and formatted with Ruff
Use type hints throughout Python code
Prefer async/await for asynchronous Python code
Enforce a maximum line length of 120 characters in Python files

Files:

  • skyvern/forge/api_app.py
  • skyvern/forge/sdk/services/local_org_auth_token_service.py
  • skyvern/forge/sdk/routes/internal_auth.py
  • tests/unit_tests/test_internal_auth.py
  • skyvern/library/skyvern.py
  • skyvern/forge/sdk/services/org_auth_service.py
skyvern/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Type-check Python code in the skyvern/ package with mypy

Files:

  • skyvern/forge/api_app.py
  • skyvern/forge/sdk/services/local_org_auth_token_service.py
  • skyvern/forge/sdk/routes/internal_auth.py
  • skyvern/library/skyvern.py
  • skyvern/forge/sdk/services/org_auth_service.py
tests/unit_tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Place unit tests under tests/unit_tests/

Files:

  • tests/unit_tests/test_internal_auth.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use pytest with async support for Python tests

Files:

  • tests/unit_tests/test_internal_auth.py
🧬 Code graph analysis (6)
skyvern-frontend/src/components/SelfHealApiKeyBanner.tsx (2)
skyvern-frontend/src/hooks/useAuthDiagnostics.ts (2)
  • AuthStatusValue (6-12)
  • useAuthDiagnostics (39-39)
skyvern-frontend/src/api/AxiosClient.ts (1)
  • getClient (111-111)
skyvern/forge/sdk/services/local_org_auth_token_service.py (4)
skyvern/forge/sdk/schemas/organizations.py (1)
  • Organization (8-21)
skyvern/forge/sdk/db/enums.py (1)
  • OrganizationAuthTokenType (4-7)
skyvern/forge/sdk/db/client.py (4)
  • get_organization_by_domain (819-823)
  • create_organization (825-845)
  • invalidate_org_auth_tokens (1010-1031)
  • create_org_auth_token (974-1008)
skyvern/forge/sdk/core/security.py (1)
  • create_access_token (11-27)
skyvern/forge/sdk/routes/internal_auth.py (2)
skyvern/forge/sdk/services/local_org_auth_token_service.py (2)
  • fingerprint_token (29-30)
  • regenerate_local_api_key (47-74)
skyvern/forge/sdk/services/org_auth_service.py (1)
  • resolve_org_from_api_key (147-198)
tests/unit_tests/test_internal_auth.py (1)
skyvern/forge/sdk/routes/internal_auth.py (1)
  • _is_local_request (33-41)
skyvern/library/skyvern.py (1)
skyvern/forge/sdk/db/client.py (2)
  • get_organization_by_domain (819-823)
  • create_organization (825-845)
skyvern/forge/sdk/services/org_auth_service.py (4)
skyvern/forge/sdk/schemas/organizations.py (2)
  • Organization (8-21)
  • OrganizationAuthToken (33-34)
skyvern/forge/sdk/db/enums.py (1)
  • OrganizationAuthTokenType (4-7)
skyvern/forge/sdk/models.py (1)
  • TokenPayload (138-140)
skyvern/forge/sdk/core/skyvern_context.py (1)
  • current (70-77)
🪛 Ruff (0.13.3)
skyvern/forge/sdk/routes/internal_auth.py

53-53: Possible hardcoded password assigned to: "token_candidate"

(S105)


71-71: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


75-75: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🔇 Additional comments (12)
skyvern-frontend/src/components/SelfHealApiKeyBanner.tsx (2)

11-11: Type definition is clear and correct.

The BannerStatus type correctly excludes "ok" from AuthStatusValue and adds "error" for query failures.


112-152: LGTM! UI rendering and conditional display logic are well-structured.

The component correctly:

  • Hides in production (checked at line 61-64)
  • Conditionally renders repair button vs error message
  • Displays status/error messages appropriately
  • Uses accessible Alert components
skyvern-frontend/src/App.tsx (1)

34-34: LGTM! Banner placement is appropriate.

Rendering SelfHealApiKeyBanner before the RouterProvider ensures it displays regardless of the current route, which is correct for a global diagnostic banner.

skyvern/forge/sdk/services/local_org_auth_token_service.py (3)

29-30: LGTM! Token fingerprinting is appropriate.

The fingerprint_token function provides a safe way to log tokens without exposing sensitive data. The fallback for short tokens prevents accidental leakage.


33-44: LGTM! Local org creation follows best practices.

The ensure_local_org function correctly:

  • Checks if the org exists before creating
  • Uses predefined constants for domain and name
  • Sets reasonable defaults for limits

47-74: Validate local API key expiration duration
API_KEY_LIFETIME is set to 5200 weeks (~100 years). Confirm this is intentional for local development; consider reducing it to a shorter, more secure timeframe.

skyvern/forge/api_app.py (1)

73-77: LGTM! Local-only route registration is correct.

The conditional registration of internal_auth.router only in local environments is appropriate for development-only diagnostics. Registering on all API version prefixes ensures consistent access.

skyvern/library/skyvern.py (1)

99-106: LGTM! Constant usage improves maintainability.

Replacing hard-coded strings with SKYVERN_LOCAL_DOMAIN and SKYVERN_LOCAL_ORG centralizes configuration and reduces the risk of typos or inconsistencies.

skyvern-frontend/src/hooks/useAuthDiagnostics.ts (1)

29-37: LGTM! Hook configuration is appropriate for dev diagnostics.

The hook configuration is correct:

  • Enabled only in development
  • No retry (diagnostics should reflect current state)
  • No refetch on window focus (avoids unnecessary requests)
skyvern/forge/sdk/services/org_auth_service.py (3)

31-36: LGTM! Dataclass encapsulates validation results well.

The ApiKeyValidationResult dataclass appropriately groups the organization, payload, and token, making the validation flow clearer and more maintainable.


201-211: LGTM! Refactored function maintains existing behavior.

The _get_current_org_cached function correctly delegates to resolve_org_from_api_key and maintains the context-setting logic. The caching decorator remains in place, preserving performance characteristics.


147-198: Confirm caching remains effective
All calls to resolve_org_from_api_key are funneled through the decorated _get_current_org_cached, and no direct invocations bypass the cache.

Comment on lines +17 to +123
class AuthStatus(str, Enum):
missing_env = "missing_env"
invalid_format = "invalid_format"
invalid = "invalid"
expired = "expired"
not_found = "not_found"
ok = "ok"


class DiagnosticsResult(NamedTuple):
status: AuthStatus
detail: str | None
validation: Any | None
token: str | None


def _is_local_request(request: Request) -> bool:
host = request.client.host if request.client else None
if not host:
return False
try:
addr = ipaddress.ip_address(host)
except ValueError:
return False
return addr.is_loopback or addr.is_private


def _require_local_access(request: Request) -> None:
if settings.ENV != "local":
raise HTTPException(status.HTTP_403_FORBIDDEN, "Endpoint only available in local env")
if not _is_local_request(request):
raise HTTPException(status.HTTP_403_FORBIDDEN, "Endpoint requires localhost access")


async def _evaluate_local_api_key(token: str) -> DiagnosticsResult:
token_candidate = token.strip()
if not token_candidate or token_candidate == "YOUR_API_KEY":
return DiagnosticsResult(status=AuthStatus.missing_env, detail=None, validation=None, token=None)

try:
validation = await resolve_org_from_api_key(token_candidate, app.DATABASE)
except HTTPException as exc:
if exc.status_code == status.HTTP_404_NOT_FOUND:
return DiagnosticsResult(status=AuthStatus.not_found, detail=None, token=None, validation=None)

detail_text = exc.detail if isinstance(exc.detail, str) else None
if exc.status_code == status.HTTP_403_FORBIDDEN:
status_value = AuthStatus.invalid
if detail_text and "expired" in detail_text.lower():
status_value = AuthStatus.expired
elif detail_text and "validate" in detail_text.lower():
status_value = AuthStatus.invalid_format
return DiagnosticsResult(status=status_value, detail=detail_text, token=None, validation=None)

LOG.error("Unexpected error while diagnosing API key", status_code=exc.status_code, detail=detail_text)
raise
except Exception:
LOG.error("Unexpected exception while diagnosing API key", exc_info=True)
raise HTTPException(status.HTTP_500_INTERNAL_SERVER_ERROR, "Unable to diagnose API key")

return DiagnosticsResult(status=AuthStatus.ok, detail=None, validation=validation, token=token_candidate)


def _emit_diagnostics(result: DiagnosticsResult) -> dict[str, object]:
status_value = result.status.value

if result.status is AuthStatus.ok and result.validation and result.token:
fingerprint = fingerprint_token(result.token)
LOG.info(
"Local auth diagnostics",
status=status_value,
organization_id=result.validation.organization.organization_id,
fingerprint=fingerprint,
expires_at=result.validation.payload.exp,
)
return {
"status": status_value,
"organization_id": result.validation.organization.organization_id,
"fingerprint": fingerprint,
"expires_at": result.validation.payload.exp,
}

log_kwargs: dict[str, object] = {"status": status_value}
if result.detail:
log_kwargs["detail"] = result.detail

LOG.warning("Local auth diagnostics", **log_kwargs)

return {"status": status_value}


@router.post("/repair")
async def repair_api_key(request: Request) -> dict[str, object]:
_require_local_access(request)

token, organization_id = await regenerate_local_api_key()

return {
"status": AuthStatus.ok.value,
"organization_id": organization_id,
"fingerprint": fingerprint_token(token),
}


@router.get("/status")
async def auth_status(request: Request) -> dict[str, object]:
_require_local_access(request)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add Google-style docstrings for public API.

Project guidelines require Google-style docstrings on public classes and endpoints. Please add them to AuthStatus, DiagnosticsResult, repair_api_key, and auth_status so future contributors see intent and response semantics.

🧰 Tools
🪛 Ruff (0.13.3)

53-53: Possible hardcoded password assigned to: "token_candidate"

(S105)


71-71: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


75-75: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🤖 Prompt for AI Agents
In skyvern/forge/sdk/routes/internal_auth.py around lines 17 to 123, add
Google-style docstrings for the public API items requested: AuthStatus,
DiagnosticsResult, repair_api_key, and auth_status. For AuthStatus and
DiagnosticsResult add a brief one-line description and for DiagnosticsResult
document attributes (status, detail, validation, token). For repair_api_key and
auth_status add a one-line summary, Args (request: Request), Returns describing
the dict keys and types (status, organization_id, fingerprint and any other
keys), and any Raises (HTTPException for access control), placing each docstring
immediately below the class or function declaration and following Google
docstring conventions.

Comment on lines +33 to +48
def _is_local_request(request: Request) -> bool:
host = request.client.host if request.client else None
if not host:
return False
try:
addr = ipaddress.ip_address(host)
except ValueError:
return False
return addr.is_loopback or addr.is_private


def _require_local_access(request: Request) -> None:
if settings.ENV != "local":
raise HTTPException(status.HTTP_403_FORBIDDEN, "Endpoint only available in local env")
if not _is_local_request(request):
raise HTTPException(status.HTTP_403_FORBIDDEN, "Endpoint requires localhost access")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Tighten local-access guard to loopback-only.

_is_local_request currently accepts any private address (addr.is_private). That includes the entire LAN (e.g., 192.168.0.15, 10.0.0.2), so anyone on the same network can hit /internal/auth/* whenever the dev server binds to 0.0.0.0. Because these endpoints regenerate credentials, that exposure violates the “localhost only” guarantee and opens a lateral-movement vector. Limit access strictly to loopback addresses (IPv4 127.0.0.0/8 and IPv6 ::1) and keep the error message accurate. For example:

-    return addr.is_loopback or addr.is_private
+    return addr.is_loopback

If you need to permit additional cases, please gate them intentionally (e.g., explicit allowlist), not via the broad is_private.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _is_local_request(request: Request) -> bool:
host = request.client.host if request.client else None
if not host:
return False
try:
addr = ipaddress.ip_address(host)
except ValueError:
return False
return addr.is_loopback or addr.is_private
def _require_local_access(request: Request) -> None:
if settings.ENV != "local":
raise HTTPException(status.HTTP_403_FORBIDDEN, "Endpoint only available in local env")
if not _is_local_request(request):
raise HTTPException(status.HTTP_403_FORBIDDEN, "Endpoint requires localhost access")
def _is_local_request(request: Request) -> bool:
host = request.client.host if request.client else None
if not host:
return False
try:
addr = ipaddress.ip_address(host)
except ValueError:
return False
return addr.is_loopback
def _require_local_access(request: Request) -> None:
if settings.ENV != "local":
raise HTTPException(status.HTTP_403_FORBIDDEN, "Endpoint only available in local env")
if not _is_local_request(request):
raise HTTPException(status.HTTP_403_FORBIDDEN, "Endpoint requires localhost access")
🤖 Prompt for AI Agents
In skyvern/forge/sdk/routes/internal_auth.py around lines 33 to 48,
_is_local_request currently treats any private IP as local (addr.is_private)
which allows LAN hosts access; change the logic to accept only loopback
addresses (addr.is_loopback) and remove the is_private check so only 127.0.0.0/8
and ::1 are allowed, and update the _require_local_access error message to
reflect "loopback/localhost only" (or add an explicit allowlist gated by a
config value if you need additional non-loopback exceptions).

Comment on lines +21 to +26
def _write_env(path: Path, key: str, value: str) -> None:
path.parent.mkdir(parents=True, exist_ok=True)
if not path.exists():
path.touch()
set_key(str(path), key, value)
LOG.info(".env written", path=(str(path)), key=key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify _write_env handles concurrent writes safely.

The _write_env function uses set_key from python-dotenv, which may not be atomic. If multiple processes or the frontend file watcher trigger simultaneous writes, corruption or race conditions could occur.

Run the following script to check if set_key operations elsewhere in the codebase use locking or other concurrency protection:


🏁 Script executed:

#!/bin/bash
# Search for other uses of set_key and check for locking patterns
rg -nP --type=py 'set_key|dotenv|\.env' --no-heading | head -50

Length of output: 4495


Add file locking around .env writes in _write_env
Wrap the set_key call in a file-based lock (e.g., via fcntl or the filelock library) in skyvern/forge/sdk/services/local_org_auth_token_service.py:21–26 to ensure atomic updates and prevent race conditions.

🤖 Prompt for AI Agents
In skyvern/forge/sdk/services/local_org_auth_token_service.py around lines 21 to
26, the call to set_key writes to the .env file without any file locking which
can cause race conditions; wrap the set_key call in a file-based lock (for
example using the filelock.FileLock or an fcntl-based lock) created in the same
directory (e.g., path.with_suffix(path.suffix + ".lock") or path.parent /
".env.lock"), acquire the lock with a reasonable timeout via a context manager,
perform set_key while the lock is held, and ensure the lock is always released
(use try/finally or the context manager) and propagate or log errors on
timeout/failure so concurrent processes don’t corrupt the .env file.

Comment on lines +64 to +67
_write_env(ROOT_ENV_PATH, "SKYVERN_API_KEY", api_key)
_write_env(FRONTEND_ENV_PATH, "VITE_SKYVERN_API_KEY", api_key)
settings.SKYVERN_API_KEY = api_key
os.environ["SKYVERN_API_KEY"] = api_key
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Mutating settings and os.environ may not propagate to all contexts.

Lines 64-67 write to .env files and update settings.SKYVERN_API_KEY and os.environ. However, these mutations may not propagate to already-running processes or threads. Consider documenting that a restart may be required or implementing a more robust configuration reload mechanism.

Document this behavior in the function docstring:

 async def regenerate_local_api_key() -> tuple[str, str]:
-    """Create a fresh API key for the local organization and persist it to env files."""
+    """
+    Create a fresh API key for the local organization and persist it to env files.
+    
+    Note: The API key is written to .env files and updated in settings and os.environ.
+    Some contexts may require a process restart to pick up the new key.
+    """
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
_write_env(ROOT_ENV_PATH, "SKYVERN_API_KEY", api_key)
_write_env(FRONTEND_ENV_PATH, "VITE_SKYVERN_API_KEY", api_key)
settings.SKYVERN_API_KEY = api_key
os.environ["SKYVERN_API_KEY"] = api_key
async def regenerate_local_api_key() -> tuple[str, str]:
"""
Create a fresh API key for the local organization and persist it to env files.
Note: The API key is written to .env files and updated in settings and os.environ.
Some contexts may require a process restart to pick up the new key.
"""
_write_env(ROOT_ENV_PATH, "SKYVERN_API_KEY", api_key)
_write_env(FRONTEND_ENV_PATH, "VITE_SKYVERN_API_KEY", api_key)
settings.SKYVERN_API_KEY = api_key
os.environ["SKYVERN_API_KEY"] = api_key
🤖 Prompt for AI Agents
In skyvern/forge/sdk/services/local_org_auth_token_service.py around lines 64 to
67, the function mutates .env files and updates settings.SKYVERN_API_KEY and
os.environ which may not propagate to already-running processes or threads;
update the function docstring to state this behavior clearly and instruct
callers that a restart of the process (or an explicit configuration reload) may
be required for changes to take effect, and optionally mention implementing a
more robust reload mechanism if live update is needed.

SKYVERN_LOCAL_DOMAIN = "skyvern.local"


def _write_env(path: Path, key: str, value: str) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to be careful merging this so we don't allow this in our cloud instance by accident!

LOG = structlog.get_logger()
PROJECT_ROOT = Path(__file__).resolve().parents[4]
ROOT_ENV_PATH = PROJECT_ROOT / ".env"
FRONTEND_ENV_PATH = PROJECT_ROOT / "skyvern-frontend" / ".env"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this work for pip installed skyvern?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the file writing would, but this change relies on the frontend dev server behavior hot reloading once the env file is modified, but you'd probably be doing skyvern run ui or something like that when using pip installed skyvern, so that wouldn't work. i'd have to think about how to support that usecase...



def _require_local_access(request: Request) -> None:
if settings.ENV != "local":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

from skyvern.forge.sdk.services.local_org_auth_token_service import fingerprint_token, regenerate_local_api_key
from skyvern.forge.sdk.services.org_auth_service import resolve_org_from_api_key

router = APIRouter(prefix="/internal/auth", tags=["internal"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you hide this from localhost:8000/docs by default?

cursor[bot]

This comment was marked as outdated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants