fix: enable cross-endpoint remote calls when required by deanq · Pull Request #196 · runpod/flash

deanq · 2026-02-12T03:45:19Z

Summary

Fixes "No module named 'torch'" error when @remote functions are called in deployments. The decorator now correctly makes HTTP calls to worker containers instead of executing locally.

This PR includes the complete fix plus supporting infrastructure for API key management and endpoint discovery optimization.

Changes

Main Fix (AE-2079)

Root Cause: ServiceRegistry couldn't discover endpoint URLs, causing @remote decorator to fall back to local execution
Solution: Pass resources_endpoints via FLASH_RESOURCES_ENDPOINTS env var and load endpoint URLs from environment
Result: Remote calls in deployed mode now use HTTP to worker containers, eliminating torch import errors in mothership container

Supporting Features

API Key Management
- ServiceRegistry checks makes_remote_calls to skip unnecessary State Manager queries for local-only endpoints
- ServerlessResource injects RUNPOD_API_KEY for QB endpoints at deploy time
- Comprehensive API key management documentation added
Resource Name Normalization
- Strip -fb suffix and live- prefix from resource names for manifest lookups
- Ensures resources with these naming patterns are properly matched
Integration Tests
- 6 tests for API key injection behavior (QB vs LB, with/without remote calls)
- 6 tests for State Manager optimization (local-only vs remote-capable endpoints)
- All 12 tests passing

Files Changed

Core preview mode fixes: preview.py, service_registry.py, models.py
API key injection: serverless.py, load_balancer_sls_resource.py
Client updates: client.py
Documentation: docs/API_Key_Management.md, CLAUDE.md
Tests: test_api_key_injection.py, test_local_only_endpoint.py

Test Plan

✅ All 12 new integration tests passing
✅ Existing test suite passing (474 tests)
✅ Quality checks passing (format, lint, type)
✅ Verified remote calls work in preview mode without torch errors

Optimizes cross-endpoint communication by skipping State Manager queries for local-only endpoints and injecting API keys only when needed. - ServiceRegistry checks makes_remote_calls to skip unnecessary queries - ServerlessResource injects RUNPOD_API_KEY for QB endpoints at deploy time - Added comprehensive API key management documentation

Strip -fb suffix and live- prefix from resource names when looking up configuration in manifest to ensure resources with these naming patterns are properly matched.

Fixes 'No module named 'torch'' error when @Remote functions are called in preview deployments. The decorator now correctly makes HTTP calls to worker containers instead of executing locally. Root cause: ServiceRegistry couldn't discover endpoint URLs in preview mode, causing @Remote decorator to fall back to local execution. Solution: - Pass resources_endpoints via FLASH_RESOURCES_ENDPOINTS env var - Load endpoint URLs from environment in ServiceRegistry - Add custom_endpoint_url field to LoadBalancerSlsResource - Map queue-based to load-balancer images in preview mode - Skip template validation when connecting to existing endpoints - Integrate ServiceRegistry into @Remote decorator wrapper - Add resources_endpoints field to Manifest model Verification: - Remote calls in preview mode now use HTTP to worker containers - No more torch import errors in mothership container - Proper endpoint URL construction for Docker DNS Related: AE-2079

…ptimization Add comprehensive integration tests for: - API key injection during QB endpoint deployment (makes_remote_calls flag) - State Manager query optimization for local-only endpoints - Safe defaults and edge case handling for both features Test coverage: - 6 tests for API key injection behavior (QB vs LB, with/without remote calls) - 6 tests for State Manager optimization (local-only vs remote-capable endpoints) All tests passing (12/12). Related: feat(runtime) commit 32dd688, fix(serverless) commit 556106d

Adds Flash environment ID as environment variable when creating resources. This enables runtime queries to State Manager using the correct ID. Part of ServiceRegistry empty endpoints fix.

Injects FLASH_ENVIRONMENT_ID for both QB and LB endpoints to enable State Manager queries at runtime. API key injection remains QB-only (env var). LB endpoints use Authorization headers from requests, not environment variables. Part of ServiceRegistry empty endpoints fix.

Changes runtime State Manager queries to use FLASH_ENVIRONMENT_ID (with fallback to RUNPOD_ENDPOINT_ID) instead of just RUNPOD_ENDPOINT_ID. Root cause: Manifest stored under Flash environment ID, not individual endpoint ID. Using wrong ID resulted in empty resources_endpoints. Adds diagnostic logging to help troubleshoot manifest retrieval issues. Fixes: Load balancer endpoints unable to discover child endpoints.

Logs endpoint URLs being uploaded to State Manager during deployment. Helps troubleshoot manifest synchronization issues between CLI and runtime. Part of ServiceRegistry empty endpoints fix.

Logs State Manager responses including presence of resources_endpoints. Helps troubleshoot runtime manifest loading issues. Part of ServiceRegistry empty endpoints fix.

Adds makes_remote_calls: True to load-balancer and mothership manifest entries. This ensures ServiceRegistry correctly identifies these endpoint types as needing State Manager queries for cross-endpoint routing. Part of ServiceRegistry empty endpoints fix.

Adds safe-default handling when makes_remote_calls is null/None in manifest. Returns True (allow remote calls) to ensure functionality works even with incomplete manifest data. Also adds unit tests for _check_makes_remote_calls(): - Returns True when makes_remote_calls is True - Returns False when makes_remote_calls is False - Returns True (safe default) when makes_remote_calls is null - Returns True (safe default) when resource not found in manifest Part of ServiceRegistry empty endpoints fix.

Removes runtime reconciliation pattern where mothership endpoints would auto-provision child endpoints on startup. This is replaced by CLI-time provisioning during `flash deploy`. Changes: - Remove reconcile_children(), is_mothership(), get_mothership_url() - Remove runtime startup reconciliation hook from LB handler - Remove is_mothership flag and FLASH_IS_MOTHERSHIP env var - Remove is_explicit flag from manifest (no longer needed) - Delete integration tests for reconciliation behavior - Update unit tests to not expect removed flags - Update deployment.py, preview.py, deploy.py to check resource name All resources are now provisioned upfront by the CLI, enabling a simpler peer-to-peer architecture where endpoints discover each other via State Manager using FLASH_ENVIRONMENT_ID. Part of ServiceRegistry empty endpoints fix.

…ning Updates comments and documentation in test-mothership command to clarify that it uses CLI-time provisioning (not runtime reconciliation). Changes: - Replace 'auto-provisioning' with 'CLI-time provisioning' - Update objective text to remove reconciliation references - Clarify that all provisioning happens before server starts - Remove outdated --auto-provision flag references The test-mothership command behavior is unchanged - it still provisions resources upfront before starting the server. This commit just fixes misleading terminology that suggested runtime reconciliation.

The test-mothership command was designed to test runtime reconciliation, which was removed in commits 880c905 and 881a72f. The command is now obsolete as it only tests CLI provisioning in Docker, which is already covered by 'flash deploy --preview'. Changes: - Delete test_mothership.py command file (457 lines) - Remove FLASH_IS_TEST_MOTHERSHIP flag handling from mothership_provisioner.py - Simplify resource naming logic (removed tmp- prefix logic) - Update comments in manifest.py and scanner.py - Remove documentation references from Flash_Deploy_Guide.md and VERIFICATION.md Users should use 'flash deploy --preview' for local testing instead. Part of ServiceRegistry empty endpoints fix.

…eploy Removing is_mothership from the manifest (880c905) caused config mismatch between local and state manifests. This triggered unnecessary re-provisioning of the mothership on every deploy. When re-provisioning an unchanged endpoint, RunPod returns endpoint_url=None, so the URL is never written to resources_endpoints, breaking the post-deploy guidance display. - Restore is_mothership: True in _create_mothership_resource() - Restore is_mothership: True in _create_mothership_from_explicit() - Restore is_mothership flag check in _display_post_deployment_guidance() alongside the existing name-based check for forwards compatibility

Post-deploy guidance wasn't showing routes because it only iterated over resources_endpoints dict, which is null/empty until after deployment provisions resources. Routes exist in the manifest's routes section but were never displayed. Add fallback logic to search resources dict for mothership and extract routes from manifest, even when resources_endpoints is empty. This allows routes to be displayed pre-deployment and handles custom mothership names via the is_mothership flag. - Add fallback loop after primary resources_endpoints check - Handles both standard ("mothership") and custom names via is_mothership - Preserves existing behavior when resources_endpoints is populated

Mothership routes were not being added to manifest's routes dict when using custom naming patterns (e.g., '03_05_load_balancer-mothership'). The route extraction logic only checked for exact names 'mothership' or 'mothership-entrypoint', missing projects with {project_name}-mothership naming. This caused post-deploy guidance to show 'No routes found in manifest' even though routes were defined in the mothership's functions array. Change: Use is_mothership flag instead of exact name matching to extract routes from all mothership resources regardless of naming pattern. - Replace hardcoded name check with is_mothership flag - Ensures routes extracted for custom-named mothership resources - Aligns with how deploy.py already identifies mothership - Single line change with broad compatibility impact

FastAPI route handlers were never analyzed for calls to @Remote functions, causing makes_remote_calls flag to always be False even when mothership called remote endpoints. Changes: - scanner.py: Add remote_function_names parameter to detect_main_app() - scanner.py: Add _analyze_fastapi_routes_for_remote_calls() function - scanner.py: Add _analyze_function_calls_for_route() function - manifest.py: Pass remote_function_names to detect_main_app() - manifest.py: Calculate makes_remote_calls from analyzed routes Impact: - Mothership with remote calls: makes_remote_calls=True (gets API key) - Mothership without remote calls: makes_remote_calls=False (skips State Manager) - Enables proper API key injection for QB workers - Enables service registry optimization Fixes: AE-2079

…rn type Cleanup unused parameter and fix return value bug in deployment flow. Changes: - Remove mothership_url parameter from create_resource_from_manifest() - Fix deploy_to_environment() to return resources_endpoints dict instead of deploy API response - Update return type annotation from Dict[str, Any] to Dict[str, str] - Add comprehensive docstring with Args/Returns/Raises - Update tests to match new return type expectations Impact: - Cleaner API (no dead parameters) - Correct return values (endpoint URLs instead of API response) - Better type safety and documentation

Add API key injection in preview mode for resources configured with makes_remote_calls=True. This ensures workers can authenticate remote calls to other endpoints. Changes: - Parse makes_remote_calls flag from manifest - Inject RUNPOD_API_KEY env var during container startup - Warn if API key needed but not set - Add integration tests for all three PRD scenarios: 1. Mothership → GPU worker 2. Mothership → CPU worker → GPU worker 3. Mothership local-only Tests validate resource parsing, endpoint routing, API key context, and environment variable fallback behavior.

…ment Fix API key propagation for load-balanced endpoints by checking request context before falling back to environment variables in three locations: 1. RunpodGraphQLClient.__init__() - State Manager and GraphQL clients 2. RunpodRestClient.__init__() - REST API clients 3. ServerlessResource.endpoint property - runpod-python SDK endpoint client Changes establish unified priority: explicit param > request context > env var This fixes remote calls and State Manager queries for load-balanced endpoints that receive API keys via Authorization header rather than environment variables. All three components now automatically pick up per-request API keys from context, enabling multi-endpoint scenarios to work correctly. Fixes scenarios: - Mothership (LB) + GPU Worker (QB) - Mothership (LB) + CPU Worker (QB) + GPU Worker (QB) - Mothership (LB) only (local execution)

- Add FLASH_IS_MOTHERSHIP flag to distinguish mothership from worker endpoints - Extract main_file and app_variable from manifest for runtime provisioning - Store only filename (not absolute path) for container portability - Set FLASH_MAIN_FILE and FLASH_APP_VARIABLE in provisioner environment Fixes deployed mothership endpoints booting in child endpoint mode. Mothership now correctly loads user code from main.py at runtime.

Enhances API key propagation across three layers: 1. Middleware (lb_handler.py): Store keys in both contextvars and request.state for better propagation to async functions and explicit access. 2. Service Registry (service_registry.py): Retrieve API key from request context and pass explicitly to State Manager client. 3. State Manager Client (state_manager_client.py): Accept explicit api_key parameter at instance-level and per-request, with fallback to context/env. 4. Serverless Resource (serverless.py): Use request context API key for LB endpoint deployments during request processing. This ensures endpoints making remote calls have authenticated access to: - Other endpoints in the Flash application - State Manager GraphQL API for service discovery Tests updated/added to validate API key propagation through all layers. Fixes: AE-2079