-
Notifications
You must be signed in to change notification settings - Fork 0
feat(iam): add IAM FastAPI endpoints with integration tests #103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…lembic - Add SQLAlchemy 2.0 with asyncpg for async database operations - Add Alembic for schema migrations - Add python-ulid for ULID support instead of UUID - Create read/write engine separation with connection pooling - Create FastAPI dependency injection for database sessions - Create SQLAlchemy declarative base with timestamp mixin - Initialize Alembic with async migration support - Create initial migration for teams table (ULID primary key) - Add comprehensive unit tests for engines and dependencies - Configure Alembic to use settings module for database URL - Enable ruff post-write hook for migration formatting Refs: AIHCM-121
- Add authzed library for SpiceDB integration - Add python-ulid for ULID support - Create ResourceType, RelationType, Permission enums (using Group not Team) - Create AuthorizationProvider protocol for swappable implementations - Implement SpiceDBClient with async methods for relationships and permissions - Create SpiceDB schema (.zed) with Tenant→Workspace→Group hierarchy - Create AuthorizationProbe for domain-oriented observability - Move ObservationContext to shared_kernel (fix architectural boundary) - Add 35 unit tests for types and probes - All 410 tests passing Refs: AIHCM-122
Resolved conflicts in authorization files by accepting remote changes: - shared_kernel/authorization/types.py (docstring fix) - shared_kernel/authorization/spicedb/client.py (_parse_reference helper)
Implement FastAPI endpoints for group management completing the vertical slice tracer bullet. Authentication: - CurrentUser dataclass (not tuple) from headers - get_current_user() dependency extracts X-User-Id and X-Username - Stub for Red Hat SSO integration (tracer bullet) API Models: - CreateGroupRequest (name, tenant_id) - GroupResponse with members - GroupMemberResponse (user_id, role) - from_domain() converts domain aggregates to API responses Endpoints: - POST /iam/groups - create group (201 Created) - GET /iam/groups/:id - get group by ID (200 OK, 404 Not Found) - DELETE /iam/groups/:id?tenant_id=... - delete group (204 No Content, 404 Not Found) Error handling: - 400 Bad Request for invalid ULIDs - 404 Not Found for missing groups - 409 Conflict for duplicate group names - 500 Internal Server Error for unexpected failures GroupService additions: - get_group(group_id) - delegates to repository - delete_group(group_id, tenant_id) - delegates to repository Routes registered in main.py under /iam prefix. Related: AIHCM-128 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>
Move transaction management from routes to GroupService for proper layering and separation of concerns. Changes: GroupService: - Takes AsyncSession as constructor parameter - Manages transactions with async with session.begin() - create_group() wraps entire use case in transaction - delete_group() wraps deletion in transaction - Routes no longer manage transactions Dependencies: - get_group_service() now takes session and creates repositories - Passes correct probe to UserService (user_service_probe) - Passes group_service_probe to GroupService Routes: - Remove session dependency - Remove async with session.begin() - Clean presentation layer (just HTTP concerns) Tests: - Add mock_session fixture for GroupService tests - Mock session.begin() context manager properly - Update all GroupService init tests Benefits: - Proper layering (routes → service → repository) - Service owns use case scope and transaction boundaries - Routes stay simple and focused on HTTP - Easier to test service logic independently 6 out of 10 API integration tests passing. Related: AIHCM-128 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>
Enable TLS for SpiceDB to work with Docker networking (non-localhost). Self-signed certificates are auto-generated via Makefile. Makefile: - Add 'certs' target to generate self-signed certificates - Generates cert with SANs for spicedb, localhost, 127.0.0.1 - Sets permissions to 644 for Docker volume mounts - 'dev' target now depends on 'certs' - Certificates only generated once (idempotent) SpiceDB service: - Configure TLS cert and key paths - Mount certificates from ./certs/ - Update healthcheck to use -tls -tls-no-verify spicedb-schema-init: - Use --skip-verify instead of --insecure - Mount certificate for verification API service: - Mount SpiceDB certificate - Set GRPC_DEFAULT_SSL_ROOTS_FILE_PATH for Python gRPC - Update env/api.env: SPICEDB_USE_TLS=true .gitignore: - Add certs/ directory This solves the "endpoint is neither UDS or TCP loopback" error when using insecure_bearer_token_credentials with Docker networking. With TLS, the API can now connect to spicedb:50051 over Docker network. Related: AIHCM-128 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>
Fix permission issues by using Docker volume instead of bind mounts for TLS certificates. Changes: Makefile: - Generate certs in local certs/ directory - Create Docker volume kartograph_spicedb_certs - Copy certs to volume with correct permissions via alpine container - Ensures SpiceDB container can read certificates compose.yaml: - Add spicedb_certs named volume - Mount volume at /certs (not /etc/spicedb/tls) - Update TLS paths to /certs/spicedb-*.pem - spicedb-schema-init uses same volume - API uses same volume with GRPC_DEFAULT_SSL_ROOTS_FILE_PATH Benefits: - Works around Chainguard base image permission restrictions - Volume has proper permissions set via alpine helper - All services share same certificate volume - Clean separation (local certs/ for generation, volume for runtime) Run 'make dev' to generate certs and start with TLS enabled. Related: AIHCM-128 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>
Remove local certificate persistence - generate directly in Docker volume for cleaner, ephemeral development setup. Changes: Makefile: - Generate certificates directly in Docker volume (no local files) - Use alpine container with openssl to create certs in volume - Check if certs exist in volume, only generate if missing - Fully idempotent and self-contained .gitignore: - Add certs/ back (in case generated locally by mistake) Benefits: - No local files to manage - No git conflicts on certs - Truly ephemeral development setup - Certs persist in Docker volume across restarts - Clean 'docker compose down -v' removes everything Developer workflow: 1. make dev (generates certs in volume on first run) 2. Certs persist in volume until 'docker compose down -v' 3. No local files created Related: AIHCM-128 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>
Refactor API to extract tenant_id from authenticated user context,
matching production JWT-based authentication pattern.
Changes:
CurrentUser:
- Add tenant_id: TenantId field
- Represents future JWT claims (user_id, username, tenant_id)
get_current_user():
- Extract X-Tenant-Id from headers (stub for JWT)
- Validate tenant_id as ULID
- In production: decode JWT and extract claims
CreateGroupRequest:
- Remove tenant_id field (comes from auth now)
- Cleaner API - just {"name": "Engineering"}
Routes:
- POST /groups - use current_user.tenant_id
- DELETE /groups/:id - use current_user.tenant_id (no query param)
- Consistent tenant isolation across all endpoints
Tests:
- Add X-Tenant-Id to all test headers
- Remove tenant_id from request bodies
- Remove params from DELETE tests
Benefits:
- API shape matches production (tenant from trusted auth)
- No way to fake tenant_id in request
- Easy to swap stub headers for JWT later
- Consistent security model
All endpoints now require tenant in auth context.
Related: AIHCM-128
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>
- Add user_id parameter to delete_group service method - Check manage permission via SpiceDB before deletion - Return 403 for unauthorized users (doesn't leak group existence) - Use Permission.MANAGE enum instead of magic string - Improve TLS cert error handling with clear messages 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add wait step to ensure SpiceDB is ready before loading schema - Use CLI flags instead of env vars for zed command (more reliable) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In @src/api/iam/presentation/routes.py:
- Around line 58-62: The except block that raises HTTPException with
detail=f"Failed to create group: {e}" leaks internal error details; instead, log
the full exception server-side (using the module logger or
logger.exception/process_logger.exception) and change the HTTPException detail
to a generic message such as "Failed to create group" or "Internal server error
while creating group" (keep HTTPException and
status.HTTP_500_INTERNAL_SERVER_ERROR), and remove/avoid including the caught
exception variable in the response (also avoid raising from e in the
HTTPException to prevent propagation of internals).
- Around line 165-171: The except block that catches Exception and raises an
HTTPException currently includes the raw exception in the 500 response
(detail=f"Failed to delete group: {e}"); change it to log the full exception
server-side (e.g., using logger.exception or logger.error with the exception)
and raise a generic HTTPException with a non-sensitive message like "Failed to
delete group." Update the except Exception handler that raises HTTPException
(the block referencing status.HTTP_500_INTERNAL_SERVER_ERROR) to call the logger
before raising and remove the interpolated exception from the response detail.
- Around line 110-116: The except block currently returns a 500 with the
internal exception text (f"Failed to get group: {e}"), which leaks
implementation details; change the HTTPException detail to a generic message
like "Failed to get group." and log the full exception server-side (e.g., use
logger.exception(...) or logger.error(..., exc_info=True)) before raising the
HTTPException; keep the HTTPException status
(status.HTTP_500_INTERNAL_SERVER_ERROR) and the exception chaining (from e) if
desired but remove the interpolated e from the client-facing detail.
🧹 Nitpick comments (1)
src/api/shared_kernel/authorization/spicedb/client.py (1)
47-85: Excellent error handling; consider optional empty certificate check.The implementation successfully addresses the critical validation and error handling issues from the previous review. The function now properly validates file existence, checks that the path is a file, and wraps I/O operations with clear error messages.
One minor defensive check suggested in the previous review was omitted: validating that the certificate file is not empty after reading. While not critical (gRPC or the connection attempt would likely catch this), it could provide earlier, clearer feedback for configuration errors.
🛡️ Optional defensive check for empty certificate
Add after line 75:
try: root_certs = cert_file.read_bytes() except OSError as e: raise ValueError( f"Failed to read TLS certificate file '{cert_path}': {e}" ) from e + if not root_certs: + raise ValueError(f"TLS certificate file is empty: {cert_path}")
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
src/api/iam/application/services/group_service.pysrc/api/iam/dependencies.pysrc/api/iam/presentation/auth.pysrc/api/iam/presentation/routes.pysrc/api/pyproject.tomlsrc/api/shared_kernel/authorization/spicedb/client.pysrc/api/tests/integration/iam/test_api.py
🚧 Files skipped from review as they are similar to previous changes (1)
- src/api/tests/integration/iam/test_api.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Use clear, descriptive names for agent functions and methods
Document agent capabilities and behavior in docstrings
Include type hints in agent function signatures
Implement error handling and validation for agent inputsImplement logging following Domain Oriented Observability pattern, preferring Domain Probes over logger.* and print() calls
Files:
src/api/shared_kernel/authorization/spicedb/client.pysrc/api/iam/application/services/group_service.pysrc/api/iam/presentation/auth.pysrc/api/iam/dependencies.pysrc/api/iam/presentation/routes.py
src/api/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use FastAPI for API development in the /src/api directory
Files:
src/api/shared_kernel/authorization/spicedb/client.pysrc/api/iam/application/services/group_service.pysrc/api/iam/presentation/auth.pysrc/api/iam/dependencies.pysrc/api/iam/presentation/routes.py
🧠 Learnings (2)
📚 Learning: 2025-12-15T21:36:53.213Z
Learnt from: jsell-rh
Repo: openshift-hyperfleet/kartograph PR: 47
File: src/api/pyproject.toml:9-9
Timestamp: 2025-12-15T21:36:53.213Z
Learning: In src/api/pyproject.toml, pin cyclopts to 5.0.0a1 to avoid licensing issues from cyclopts v4's transitive dependency on docutils via rich-rst. This specific, file-level change should be maintained as documented in FastMCP's installation guide, and ensure downstream dependencies do not pull in the problematic transitive lib. If upgrading or auditing, verify that docutils is not required by default and that the chosen alpha version remains compatible with the project.
Applied to files:
src/api/pyproject.toml
📚 Learning: 2025-12-11T22:34:03.720Z
Learnt from: CR
Repo: openshift-hyperfleet/kartograph PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-11T22:34:03.720Z
Learning: Applies to **/*test*.py : Use `pytest-archon` to write tests that explicitly enforce architectural boundaries between bounded contexts
Applied to files:
src/api/pyproject.toml
🧬 Code graph analysis (1)
src/api/iam/presentation/auth.py (4)
src/api/iam/domain/value_objects.py (2)
UserId(74-80)from_string(43-60)src/api/iam/dependencies.py (1)
get_user_service(74-88)src/api/iam/application/services/user_service.py (2)
UserService(15-104)ensure_user(38-104)src/api/infrastructure/database/dependencies.py (1)
get_write_session(106-130)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Test Python 3.12
- GitHub Check: Konflux kflux-prd-rh02 / kartograph-api-on-pull-request
🔇 Additional comments (6)
src/api/shared_kernel/authorization/spicedb/client.py (1)
157-157: LGTM! Clean integration of TLS certificate path.The
cert_pathparameter is properly integrated throughout the class:
- Constructor signature is backward-compatible with a sensible default
- Documentation clearly describes the parameter's purpose
- Instance variable is correctly stored and passed to
_create_tls_credentials- Maintains existing behavior when
cert_pathisNoneAlso applies to: 166-166, 172-172, 188-190
src/api/iam/presentation/auth.py (1)
67-83: Past issue resolved: tenant validation now prevents orphaned users.The tenant_id validation (lines 68-74) now correctly happens before user provisioning (lines 80-81), preventing orphaned user records when the tenant_id is invalid. The explicit comment on line 67 documents this requirement clearly.
src/api/iam/dependencies.py (1)
91-113: Past issue resolved: docstring now correctly documents parameters.The docstring (line 101) now correctly references
group_service_proberather than the non-existentuser_service_probe. The parameter documentation accurately matches the function signature.src/api/iam/presentation/routes.py (1)
150-164: Past issue resolved: user-level authorization now enforced.The delete operation now passes
user_idto the service layer (line 152) and properly handles authorization failures with HTTP 403 (lines 160-164). The service layer validates MANAGE permission before deletion.src/api/iam/application/services/group_service.py (2)
30-48: Past issue resolved: docstring accurately documents constructor parameters.The docstring (lines 40-43) now correctly documents the actual parameters (
session,group_repository,authz,probe) without referencing the removeduser_serviceparameter.
131-164: Authorization check properly implements user-level permission validation.The
delete_groupmethod correctly enforces user-level authorization by checking MANAGE permission (lines 150-157) before performing deletion. This prevents unauthorized users from deleting groups even if they belong to the same tenant, addressing the requirement noted in the routes.py past review.
| except Exception as e: | ||
| raise HTTPException( | ||
| status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, | ||
| detail=f"Failed to create group: {e}", | ||
| ) from e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid exposing internal exception details in 500 responses.
The error message on line 61 (f"Failed to create group: {e}") may leak internal implementation details (stack traces, database errors, connection strings, etc.) to API consumers. This is a security concern.
Log the full exception server-side for debugging and return a generic message to the client.
🔒 Proposed fix to prevent information leakage
except Exception as e:
+ # TODO: Add structured logging here
+ # logger.exception("Failed to create group", extra={"name": request.name})
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
- detail=f"Failed to create group: {e}",
+ detail="Failed to create group",
) from e📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| except Exception as e: | |
| raise HTTPException( | |
| status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, | |
| detail=f"Failed to create group: {e}", | |
| ) from e | |
| except Exception as e: | |
| # TODO: Add structured logging here | |
| # logger.exception("Failed to create group", extra={"name": request.name}) | |
| raise HTTPException( | |
| status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, | |
| detail="Failed to create group", | |
| ) from e |
🤖 Prompt for AI Agents
In @src/api/iam/presentation/routes.py around lines 58 - 62, The except block
that raises HTTPException with detail=f"Failed to create group: {e}" leaks
internal error details; instead, log the full exception server-side (using the
module logger or logger.exception/process_logger.exception) and change the
HTTPException detail to a generic message such as "Failed to create group" or
"Internal server error while creating group" (keep HTTPException and
status.HTTP_500_INTERNAL_SERVER_ERROR), and remove/avoid including the caught
exception variable in the response (also avoid raising from e in the
HTTPException to prevent propagation of internals).
| except HTTPException: | ||
| raise | ||
| except Exception as e: | ||
| raise HTTPException( | ||
| status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, | ||
| detail=f"Failed to get group: {e}", | ||
| ) from e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid exposing internal exception details in 500 responses.
The error message on line 115 (f"Failed to get group: {e}") exposes internal implementation details. Use a generic error message and log the full exception server-side.
🔒 Proposed fix to prevent information leakage
except HTTPException:
raise
except Exception as e:
+ # TODO: Add structured logging here
+ # logger.exception("Failed to get group", extra={"group_id": group_id})
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
- detail=f"Failed to get group: {e}",
+ detail="Failed to get group",
) from e🤖 Prompt for AI Agents
In @src/api/iam/presentation/routes.py around lines 110 - 116, The except block
currently returns a 500 with the internal exception text (f"Failed to get group:
{e}"), which leaks implementation details; change the HTTPException detail to a
generic message like "Failed to get group." and log the full exception
server-side (e.g., use logger.exception(...) or logger.error(...,
exc_info=True)) before raising the HTTPException; keep the HTTPException status
(status.HTTP_500_INTERNAL_SERVER_ERROR) and the exception chaining (from e) if
desired but remove the interpolated e from the client-facing detail.
| except HTTPException: | ||
| raise | ||
| except Exception as e: | ||
| raise HTTPException( | ||
| status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, | ||
| detail=f"Failed to delete group: {e}", | ||
| ) from e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid exposing internal exception details in 500 responses.
The error message on line 170 (f"Failed to delete group: {e}") exposes internal implementation details. Use a generic error message and log the full exception server-side.
🔒 Proposed fix to prevent information leakage
except HTTPException:
raise
except Exception as e:
+ # TODO: Add structured logging here
+ # logger.exception("Failed to delete group", extra={"group_id": group_id})
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
- detail=f"Failed to delete group: {e}",
+ detail="Failed to delete group",
) from e🤖 Prompt for AI Agents
In @src/api/iam/presentation/routes.py around lines 165 - 171, The except block
that catches Exception and raises an HTTPException currently includes the raw
exception in the 500 response (detail=f"Failed to delete group: {e}"); change it
to log the full exception server-side (e.g., using logger.exception or
logger.error with the exception) and raise a generic HTTPException with a
non-sensitive message like "Failed to delete group." Update the except Exception
handler that raises HTTPException (the block referencing
status.HTTP_500_INTERNAL_SERVER_ERROR) to call the logger before raising and
remove the interpolated exception from the response detail.
- Show schema file contents before loading - Use absolute path for volume mount - Verify schema with 'schema read' after loading 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
.github/workflows/tests.yml (1)
66-77: Add explicit failure if SpiceDB doesn't become ready.The readiness check exits after 30 seconds regardless of whether SpiceDB is ready. If SpiceDB fails to start, the subsequent schema load step will fail with a potentially unclear error message.
♻️ Proposed fix to add explicit timeout handling
- name: Wait for SpiceDB to be ready run: | echo "Waiting for SpiceDB to be ready..." + READY=false for i in {1..30}; do if nc -z localhost 50051 2>/dev/null; then echo "SpiceDB is ready!" + READY=true break fi echo "Waiting... ($i/30)" sleep 1 done + if [ "$READY" != "true" ]; then + echo "ERROR: SpiceDB did not become ready within 30 seconds" + exit 1 + fi
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.github/workflows/tests.yml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Konflux kflux-prd-rh02 / kartograph-api-on-pull-request
🔇 Additional comments (3)
.github/workflows/tests.yml (3)
83-87: LGTM!The switch from environment variables to explicit CLI flags improves clarity and reliability. The
--insecureflag is consistent with the non-TLS configuration used throughout the CI pipeline.
112-112: LGTM!Setting
SPICEDB_USE_TLS: "false"is consistent with the--insecureflag used in the schema loading step, appropriately configuring the CI environment for non-TLS SpiceDB communication.
41-41: SpiceDB v1.48.0 is a valid official release. All known security vulnerabilities (CVE-2024-27101, CVE-2024-38361, CVE-2024-46989, CVE-2025-49011, CVE-2025-64529, CVE-2025-65111) have been patched in this version, and CVE-2023-29193 has documented mitigations. The version is secure for use.
- Add infrastructure/logging.py with structlog configuration - Enable colored console output when FORCE_COLOR=1 or in TTY - Use JSON output in production (non-TTY, no FORCE_COLOR) - Add FORCE_COLOR=1 and tty:true to docker-compose api service 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Refactor get_mcp_query_service to use context manager pattern instead of nested Depends() chains. FastMCP's docket DI system doesn't support nested dependency resolution the same way FastAPI does. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…-hyperfleet/kartograph into AIHCM-128-iam-fastapi-endpoint
Add type: ignore for FastMCP's Depends which has overly strict type stubs that use Never, making no valid context manager type satisfy the signature. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In @.github/workflows/tests.yml:
- Around line 66-76: The "Wait for SpiceDB to be ready" step's loop doesn't exit
with failure when SpiceDB never becomes ready; modify the step so after the for
loop you detect whether the service became available (e.g., via a readiness flag
or checking nc again) and if not, emit a clear error message and exit non-zero
(exit 1) to fail the job; update the shell snippet surrounding the loop in the
workflow step to set/inspect a readiness variable (or check nc result) and call
exit 1 when SpiceDB is still unreachable after the 30-iteration timeout.
In @src/api/main.py:
- Around line 71-82: Remove the debug print by deleting the standalone
print(cors_settings.origins) statement in the CORS configuration block; locate
the code around the cors_settings = get_cors_settings() and the if
cors_settings.is_enabled: middleware setup (references: cors_settings,
cors_settings.origins) and ensure no other stray print calls remain—use proper
logging or domain probes instead if observation is required.
🧹 Nitpick comments (3)
src/api/infrastructure/logging.py (1)
33-45: Consider adding exception info processor to colored mode.The production (JSON) path includes
format_exc_infobut the colored console path does not. WhileConsoleRenderercan handle exceptions, explicitly addingformat_exc_infoensures consistent exception formatting across both modes.♻️ Optional improvement
if use_colors: # Development: colored console output processors: list[structlog.types.Processor] = [ *shared_processors, + structlog.processors.format_exc_info, structlog.dev.ConsoleRenderer(colors=True), ]src/api/tests/unit/query/test_dependencies.py (2)
41-43: Avoid testing private implementation details.Line 43 asserts on
service._repository, which is a private attribute (prefixed with underscore). This couples the test to internal implementation details.Consider testing public behavior instead, such as verifying the service can execute a method without error, or checking the service's type only.
♻️ Suggested alternative
with get_mcp_query_service() as service: assert isinstance(service, MCPQueryService) - assert service._repository is not None + # Service was successfully created with dependencies + # Public behavior can be tested via method calls if needed
9-26: Consider adding edge case tests.Per coding guidelines, tests should cover edge cases. Consider adding tests for:
- Exception handling when connection fails
- Proper cleanup when an exception occurs inside the context
Example edge case test
@patch("query.dependencies.get_age_connection_pool") @patch("query.dependencies.get_database_settings") @patch("query.dependencies.AgeGraphClient") def test_disconnects_on_exception(self, mock_client_cls, mock_get_settings, mock_get_pool): """Should disconnect client even when exception occurs inside context.""" mock_client = MagicMock() mock_client_cls.return_value = mock_client with pytest.raises(RuntimeError): with mcp_graph_client_context() as client: raise RuntimeError("Test error") mock_client.disconnect.assert_called_once()
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
.github/workflows/tests.ymlcompose.yamlenv/api.envsrc/api/infrastructure/logging.pysrc/api/infrastructure/settings.pysrc/api/main.pysrc/api/pyproject.tomlsrc/api/query/dependencies.pysrc/api/tests/unit/query/test_dependencies.py
🚧 Files skipped from review as they are similar to previous changes (2)
- env/api.env
- src/api/pyproject.toml
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Use clear, descriptive names for agent functions and methods
Document agent capabilities and behavior in docstrings
Include type hints in agent function signatures
Implement error handling and validation for agent inputsImplement logging following Domain Oriented Observability pattern, preferring Domain Probes over logger.* and print() calls
Files:
src/api/main.pysrc/api/infrastructure/logging.pysrc/api/query/dependencies.pysrc/api/infrastructure/settings.pysrc/api/tests/unit/query/test_dependencies.py
src/api/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use FastAPI for API development in the /src/api directory
Files:
src/api/main.pysrc/api/infrastructure/logging.pysrc/api/query/dependencies.pysrc/api/infrastructure/settings.pysrc/api/tests/unit/query/test_dependencies.py
**/test_*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Write unit tests for agent behavior and edge cases
Files:
src/api/tests/unit/query/test_dependencies.py
**/*test*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use
pytest-archonto write tests that explicitly enforce architectural boundaries between bounded contexts
Files:
src/api/tests/unit/query/test_dependencies.py
🧬 Code graph analysis (3)
src/api/main.py (4)
src/api/infrastructure/settings.py (4)
cors(176-178)database(166-168)get_cors_settings(209-214)is_enabled(113-115)src/api/graph/dependencies.py (1)
get_age_graph_client(55-76)src/api/infrastructure/database/dependencies.py (2)
close_database_engines(65-79)init_database_engines(31-62)src/api/infrastructure/logging.py (1)
configure_logging(13-53)
src/api/query/dependencies.py (3)
src/api/graph/infrastructure/age_client.py (1)
AgeGraphClient(36-299)src/api/infrastructure/dependencies.py (1)
get_age_connection_pool(14-23)src/api/query/infrastructure/query_repository.py (1)
QueryGraphRepository(24-171)
src/api/tests/unit/query/test_dependencies.py (2)
src/api/query/dependencies.py (2)
get_mcp_query_service(60-74)mcp_graph_client_context(36-56)src/api/graph/infrastructure/age_client.py (1)
AgeGraphClient(36-299)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Konflux kflux-prd-rh02 / kartograph-api-on-pull-request
🔇 Additional comments (16)
compose.yaml (3)
14-23: SpiceDB TLS configuration looks correct for development.The TLS setup with self-signed certificates is properly configured. The healthcheck correctly uses
-tls-no-verifysince the certificate is self-signed. Note that in production, you'd want proper CA-signed certificates and remove the-tls-no-verifyflag.
98-103: Consistent TLS configuration for schema init service.The
--no-verify-caflag andGRPC_DEFAULT_SSL_ROOTS_FILE_PATHenvironment variable are properly set for the self-signed certificate setup.
119-124: The code configuration is correct for development use.The
tty: trueandFORCE_COLOR=1settings enable development logging visibility, and the TLS certificate path configuration is properly set. The Makefile includes acertstarget that generates self-signed certificates, and thedevtarget depends on it, ensuring certificates exist before runningdocker-compose..github/workflows/tests.yml (2)
81-97: Schema loading with verification is a good practice.The explicit schema read after write provides CI visibility and confirms the schema was loaded correctly. The
--insecureflag is consistent withSPICEDB_USE_TLS: "false"in the test environment.
120-123: TLS configuration is consistent with CI setup.The
SPICEDB_USE_TLS: "false"setting correctly matches the--insecureflags used in the schema loading steps.src/api/infrastructure/logging.py (1)
47-53: LGTM! Well-structured logging configuration.The structlog configuration follows best practices with appropriate processors, context management, and caching. The
make_filtering_bound_logger(0)allows all log levels through, which is fine since filtering can be done at the handler level or via environment configuration if needed.src/api/main.py (2)
23-24: LGTM! Early logging configuration.Configuring structlog at module level before any loggers are created ensures consistent logging behavior throughout the application lifecycle.
89-90: LGTM! IAM routes integration.The IAM router is properly included following the same pattern as the graph routes.
src/api/infrastructure/settings.py (3)
75-116: LGTM! Well-structured CORS settings.The CORSSettings class is well-documented with clear environment variable mappings. The
is_enabledproperty correctly gates CORS middleware activation, preventing accidental exposure when no origins are configured. The defaults are sensible for development.
142-149: TLS-first default is a good security posture.Changing
use_tlsdefault toTrueenforces secure connections by default, requiring explicit opt-out for local development. The optionalcert_pathfield appropriately supports self-signed certificates for development/testing scenarios.
175-178: Consistent pattern for cached settings accessor.The
corsproperty andget_cors_settings()function follow the established pattern used for database and SpiceDB settings.Also applies to: 208-214
src/api/tests/unit/query/test_dependencies.py (2)
1-6: LGTM!Imports are correctly organized and updated to reflect the new context manager pattern for
mcp_graph_client_context.
12-26: Test isolation and architectural boundary concerns.This unit test patches
get_age_connection_poolandget_database_settingsbut creates a realAgeGraphClientinstance and callsconnect(). Since the connection pool is mocked,ConnectionFactory.get_connection()returns a MagicMock connection object. This MagicMock is then passed toage.setUpAge(), which expects a real psycopg2 connection and may not behave correctly with a mock.Additionally, this test crosses the boundary from the Query bounded context to the Graph infrastructure layer without explicit architectural testing. Per the coding guidelines, tests matching
**/*test*.pyshould usepytest-archonto enforce architectural boundaries between bounded contexts.Recommended fixes:
- Mock
AgeGraphClient.connect()anddisconnect()directly for proper unit test isolation, or move this test to integration tests where real connections are available.- Add pytest-archon rules to enforce Query context should not directly import Graph infrastructure internals.
src/api/query/dependencies.py (3)
7-8: LGTM!Import additions for
contextmanagerand updatedTYPE_CHECKINGusage are appropriate for the new context manager pattern.
35-56: Well-structured context manager with proper lifecycle management.The implementation correctly:
- Uses
@contextmanagerwith properGeneratortype hint- Employs runtime import to avoid circular dependencies
- Ensures cleanup via
try/finallyblock- Documents the shared pool efficiency
The
disconnect()call infinallyis safe even ifconnect()fails, asAgeGraphClient.disconnect()checks for_current_connection is not Nonebefore cleanup.
59-74: Clean refactor to context manager pattern.The implementation effectively:
- Resolves the FastMCP nested
Depends()limitation as documented- Properly nests
mcp_graph_client_contextfor automatic lifecycle management- Creates stateless probe and repository per invocation
This aligns with the PR objective to refactor
get_mcp_query_serviceto a context-manager pattern.
| - name: Wait for SpiceDB to be ready | ||
| run: | | ||
| echo "Waiting for SpiceDB to be ready..." | ||
| for i in {1..30}; do | ||
| if nc -z localhost 50051 2>/dev/null; then | ||
| echo "SpiceDB is ready!" | ||
| break | ||
| fi | ||
| echo "Waiting... ($i/30)" | ||
| sleep 1 | ||
| done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add explicit failure if SpiceDB does not become ready.
The wait loop lacks an exit failure when SpiceDB doesn't become ready within 30 seconds, which could lead to silent failures and confusing downstream errors.
🔧 Proposed fix
- name: Wait for SpiceDB to be ready
run: |
echo "Waiting for SpiceDB to be ready..."
for i in {1..30}; do
if nc -z localhost 50051 2>/dev/null; then
echo "SpiceDB is ready!"
- break
+ exit 0
fi
echo "Waiting... ($i/30)"
sleep 1
done
+ echo "SpiceDB failed to become ready within 30 seconds"
+ exit 1🤖 Prompt for AI Agents
In @.github/workflows/tests.yml around lines 66 - 76, The "Wait for SpiceDB to
be ready" step's loop doesn't exit with failure when SpiceDB never becomes
ready; modify the step so after the for loop you detect whether the service
became available (e.g., via a readiness flag or checking nc again) and if not,
emit a clear error message and exit non-zero (exit 1) to fail the job; update
the shell snippet surrounding the loop in the workflow step to set/inspect a
readiness variable (or check nc result) and call exit 1 when SpiceDB is still
unreachable after the 30-iteration timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/api/query/dependencies.py (1)
59-74: Consider minor improvements for consistency and clarity.The implementation correctly handles dependency lifecycle, but there are two optional refinements:
Docstring typo: "docket DI system" should likely be "bespoke DI system" or another appropriate term.
Type annotation consistency: This function returns
Iterator[MCPQueryService]whilemcp_graph_client_context()returnsGenerator["AgeGraphClient", None, None]. Both are valid for context managers, but usingGeneratorconsistently would improve code uniformity.♻️ Proposed improvements
@contextmanager -def get_mcp_query_service() -> Iterator[MCPQueryService]: +def get_mcp_query_service() -> Generator[MCPQueryService, None, None]: """Get MCPQueryService for MCP operations. - Context manager that manually resolves all dependencies to work with - FastMCP's docket DI system, which doesn't support nested Depends() chains. + Context manager that manually resolves all dependencies to work with + FastMCP's bespoke DI system, which doesn't support nested Depends() chains. Handles graph client lifecycle (connect/disconnect) automatically. Yields: MCPQueryService instance with active database connection """
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/api/query/dependencies.pysrc/api/query/presentation/mcp.py
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Use clear, descriptive names for agent functions and methods
Document agent capabilities and behavior in docstrings
Include type hints in agent function signatures
Implement error handling and validation for agent inputsImplement logging following Domain Oriented Observability pattern, preferring Domain Probes over logger.* and print() calls
Files:
src/api/query/dependencies.pysrc/api/query/presentation/mcp.py
src/api/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use FastAPI for API development in the /src/api directory
Files:
src/api/query/dependencies.pysrc/api/query/presentation/mcp.py
**/*mcp*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use FastMCP for Model Context Protocol (MCP) server implementation
Files:
src/api/query/presentation/mcp.py
🧬 Code graph analysis (1)
src/api/query/presentation/mcp.py (4)
src/api/tests/unit/query/test_application_services.py (1)
service(32-39)src/api/tests/integration/test_query_mcp.py (1)
service(33-35)src/api/query/application/services.py (1)
MCPQueryService(21-127)src/api/query/dependencies.py (1)
get_mcp_query_service(60-74)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Konflux kflux-prd-rh02 / kartograph-api-on-pull-request
🔇 Additional comments (3)
src/api/query/dependencies.py (2)
7-8: LGTM!The import additions properly support the new context manager pattern.
TYPE_CHECKINGis used correctly to avoid circular dependencies with the runtime import ofAgeGraphClient.
35-56: LGTM!The context manager properly handles graph client lifecycle with appropriate cleanup semantics. The try/finally ensures
disconnect()is called even if exceptions occur during operation. The runtime import ofAgeGraphClientis well-documented and avoids circular dependencies.src/api/query/presentation/mcp.py (1)
54-54: FastMCP officially supports context manager dependencies; the type: ignore comment is appropriate.FastMCP's
Depends()function accepts context manager-style dependencies, including generator-based ones likeIterator[MCPQueryService]. Thetype: ignore[arg-type]comment is correct and necessary because the type checker cannot understand thatDepends()accepts context managers; however, this is not a sign of fragility or incorrect usage—it's an officially supported pattern in FastMCP.Likely an incorrect or invalid review comment.
Replace magic string "admin" with Role enum in: - GroupMemberResponse.role field type (now typed as Role) - Integration test assertions (use Role.ADMIN.value) - Unit test mocks (use Role.ADMIN.value) This provides better type safety, IDE autocomplete, and prevents typos. Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>
Move authentication dependencies to proper architectural layers: - Create iam/application/value_objects.py for CurrentUser - Move get_current_user from presentation/auth.py to dependencies.py - Delete presentation/auth.py (mixing concerns) Fix UserService transaction management: - Service now manages its own transaction (consistent with GroupService) - Wrap entire ensure_user method in single transaction - Prevents SQLAlchemy autobegin conflicts - Remove transaction management from get_current_user caller Update tests: - Fix mock_session to properly mock begin() context manager - All 578 tests passing This follows proper clean architecture: - Presentation layer: only HTTP-specific code (routes, models) - Application layer: value objects for cross-cutting concerns - Dependencies: DI functions for all contexts Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>
Use Depends(get_user_repository) and Depends(get_group_repository) instead of constructing repositories directly in service factories. Benefits: - DRY: Single source of truth for repository construction - Maintainability: Changes to repository construction in one place - Proper DI composition: Leverages FastAPI's dependency injection Note: FastAPI's dependency caching ensures the session is shared between the repository and service (same instance per request). Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>
Add comprehensive test coverage for GroupService methods: TestGetGroup (3 tests): - Returns group when found and authorized - Returns None when group not found - Returns None when tenant access denied (security) TestDeleteGroup (3 tests): - Deletes group when user has manage permission - Raises PermissionError when unauthorized - Returns False when group not found Total: 13 tests for GroupService (was 7, now complete coverage) All 584 tests passing. Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>
…tracking Replace RuntimeError exception message matching with explicit module-level state tracking for MCP initialization. Before (fragile): - Caught RuntimeError and checked for "can only be called once" string - Could mask other RuntimeErrors that should propagate - Breaks if MCP library changes error messages After (robust): - Track _mcp_initialized flag at module level - Explicitly skip MCP lifespan if already initialized - No exception handling needed - clean conditional logic This is particularly important for tests that run multiple lifespan cycles with LifespanManager. Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @src/api/main.py:
- Around line 26-27: The module-level flag _mcp_initialized persists across test
lifespans and breaks test isolation; add an autouse pytest fixture that resets
it to False before each test to ensure MCP is re-initialized per TestClient.
Implement a fixture in your tests (e.g., tests/conftest.py) that imports
src.api.main and sets src.api.main._mcp_initialized = False in a function-scoped
autouse fixture, or alternatively refactor the flag into a per-app state (e.g.,
app.state._mcp_initialized) and clear/initialize that in the app lifespan
handler so the MCP init/cleanup (the lifespan code that currently checks
_mcp_initialized) runs reliably between tests.
🧹 Nitpick comments (8)
src/api/tests/unit/iam/application/test_group_service.py (3)
135-158: Dead code:mock_user_serviceis set up but not used.The test accepts
mock_user_serviceas a parameter and configuresmock_user_service.ensure_user, but based on the updatedgroup_servicefixture (lines 57-67),GroupServiceno longer takes auser_servicedependency. Thecreatorvariable andmock_user_servicesetup appear to be leftover from a previous implementation.♻️ Suggested cleanup
@pytest.mark.asyncio async def test_saves_group_to_repository( - self, group_service, mock_user_service, mock_group_repository + self, group_service, mock_group_repository ): """Should save group via repository.""" creator_id = UserId.generate() tenant_id = TenantId.generate() - creator = User(id=creator_id, username="alice") - - mock_user_service.ensure_user = AsyncMock(return_value=creator) mock_group_repository.save = AsyncMock()
159-182: Same issue:mock_user_servicesetup is unused in probe test.Similar to the previous test,
mock_user_serviceis accepted and configured but not actually used by theGroupServiceunder test.♻️ Suggested cleanup
@pytest.mark.asyncio async def test_records_success_probe_event( - self, group_service, mock_user_service, mock_probe + self, group_service, mock_probe ): """Should record group_created probe event on success.""" creator_id = UserId.generate() tenant_id = TenantId.generate() - creator = User(id=creator_id, username="alice") - - mock_user_service.ensure_user = AsyncMock(return_value=creator)
183-208: Same cleanup needed for failure probe test.The
mock_user_servicefixture and its configuration are not used by the currentGroupServiceimplementation.♻️ Suggested cleanup
@pytest.mark.asyncio async def test_records_failure_probe_event( - self, group_service, mock_user_service, mock_group_repository, mock_probe + self, group_service, mock_group_repository, mock_probe ): """Should record group_creation_failed probe event on error.""" creator_id = UserId.generate() tenant_id = TenantId.generate() - creator = User(id=creator_id, username="alice") - - mock_user_service.ensure_user = AsyncMock(return_value=creator) mock_group_repository.save = AsyncMock(src/api/infrastructure/database/dependencies.py (1)
65-79: Add type hint for theappparameter.The parameter should be typed as
FastAPIfor consistency withinit_database_enginesand better IDE support.📝 Suggested fix
-async def close_database_engines(app) -> None: +async def close_database_engines(app: FastAPI) -> None: """Close database engines on app shutdown.src/api/iam/presentation/routes.py (4)
53-62: Sanitize error messages to avoid potential information leakage.The error handlers expose internal exception details via
str(e)and f-strings, which could leak sensitive information or internal implementation details. Consider logging the full error internally while returning generic messages to clients.🔒 Proposed fix with logging
except DuplicateGroupNameError as e: + # Log full error internally for debugging + logger.error(f"Group creation conflict: {e}", exc_info=True) raise HTTPException( status_code=status.HTTP_409_CONFLICT, - detail=str(e), + detail="A group with this name already exists", ) from e except Exception as e: + # Log full error internally for debugging + logger.exception(f"Group creation failed: {e}") raise HTTPException( status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, - detail=f"Failed to create group: {e}", + detail="Failed to create group", ) from eNote: Implement structured logging following Domain Oriented Observability pattern as per coding guidelines.
76-78: Consider adding the presentation-layer tenant access dependency.The TODO indicates a missing tenant-level access check at the presentation layer. While the service layer handles operation-specific authorization, adding a presentation-layer dependency would provide defense-in-depth and clearer separation of concerns.
Do you want me to generate a sample implementation for
require_tenant_accessdependency, or would you prefer to open a new issue to track this enhancement?
94-116: Sanitize error messages consistently across endpoints.Similar to
create_group, the error handlers expose internal exception details that could leak sensitive information. Apply the same sanitization pattern here.🔒 Proposed fix
try: group_id_obj = GroupId.from_string(group_id) except ValueError as e: + logger.warning(f"Invalid group ID format: {group_id}", exc_info=True) raise HTTPException( status_code=status.HTTP_400_BAD_REQUEST, - detail=f"Invalid group ID format: {e}", + detail="Invalid group ID format", ) from e try: group = await service.get_group(group_id_obj, current_user.tenant_id) if group is None: raise HTTPException( status_code=status.HTTP_404_NOT_FOUND, detail=f"Group {group_id} not found", ) return GroupResponse.from_domain(group) except HTTPException: raise except Exception as e: + logger.exception(f"Failed to get group {group_id}") raise HTTPException( status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, - detail=f"Failed to get group: {e}", + detail="Failed to retrieve group", ) from e
142-171: Sanitize error messages and consider permission error details.Apply consistent error message sanitization. Additionally, the
PermissionErrorhandler might expose too much detail about the authorization model (e.g., specific permission names).🔒 Proposed fix
try: group_id_obj = GroupId.from_string(group_id) except ValueError as e: + logger.warning(f"Invalid group ID format: {group_id}", exc_info=True) raise HTTPException( status_code=status.HTTP_400_BAD_REQUEST, - detail=f"Invalid ID format: {e}", + detail="Invalid group ID format", ) from e try: deleted = await service.delete_group( group_id_obj, current_user.tenant_id, current_user.user_id ) if not deleted: raise HTTPException( status_code=status.HTTP_404_NOT_FOUND, detail=f"Group {group_id} not found", ) except PermissionError as e: + logger.warning(f"Permission denied for group deletion: {e}") raise HTTPException( status_code=status.HTTP_403_FORBIDDEN, - detail=str(e), + detail="Permission denied", ) from e except HTTPException: raise except Exception as e: + logger.exception(f"Failed to delete group {group_id}") raise HTTPException( status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, - detail=f"Failed to delete group: {e}", + detail="Failed to delete group", ) from e
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
src/api/iam/application/services/user_service.pysrc/api/iam/application/value_objects.pysrc/api/iam/dependencies.pysrc/api/iam/presentation/models.pysrc/api/iam/presentation/routes.pysrc/api/infrastructure/database/dependencies.pysrc/api/main.pysrc/api/tests/integration/iam/test_api.pysrc/api/tests/unit/iam/application/test_group_service.pysrc/api/tests/unit/iam/application/test_user_service.pysrc/api/tests/unit/iam/infrastructure/test_group_repository.py
🚧 Files skipped from review as they are similar to previous changes (1)
- src/api/tests/integration/iam/test_api.py
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Use clear, descriptive names for agent functions and methods
Document agent capabilities and behavior in docstrings
Include type hints in agent function signatures
Implement error handling and validation for agent inputsImplement logging following Domain Oriented Observability pattern, preferring Domain Probes over logger.* and print() calls
Files:
src/api/iam/application/value_objects.pysrc/api/tests/unit/iam/application/test_user_service.pysrc/api/tests/unit/iam/application/test_group_service.pysrc/api/tests/unit/iam/infrastructure/test_group_repository.pysrc/api/iam/dependencies.pysrc/api/main.pysrc/api/iam/presentation/routes.pysrc/api/infrastructure/database/dependencies.pysrc/api/iam/application/services/user_service.pysrc/api/iam/presentation/models.py
src/api/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use FastAPI for API development in the /src/api directory
Files:
src/api/iam/application/value_objects.pysrc/api/tests/unit/iam/application/test_user_service.pysrc/api/tests/unit/iam/application/test_group_service.pysrc/api/tests/unit/iam/infrastructure/test_group_repository.pysrc/api/iam/dependencies.pysrc/api/main.pysrc/api/iam/presentation/routes.pysrc/api/infrastructure/database/dependencies.pysrc/api/iam/application/services/user_service.pysrc/api/iam/presentation/models.py
**/test_*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Write unit tests for agent behavior and edge cases
Files:
src/api/tests/unit/iam/application/test_user_service.pysrc/api/tests/unit/iam/application/test_group_service.pysrc/api/tests/unit/iam/infrastructure/test_group_repository.py
**/*test*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use
pytest-archonto write tests that explicitly enforce architectural boundaries between bounded contexts
Files:
src/api/tests/unit/iam/application/test_user_service.pysrc/api/tests/unit/iam/application/test_group_service.pysrc/api/tests/unit/iam/infrastructure/test_group_repository.py
🧬 Code graph analysis (10)
src/api/iam/application/value_objects.py (1)
src/api/iam/domain/value_objects.py (2)
TenantId(94-100)UserId(74-80)
src/api/tests/unit/iam/application/test_user_service.py (2)
src/api/iam/ports/repositories.py (1)
IUserRepository(93-130)src/api/iam/application/services/user_service.py (1)
UserService(15-99)
src/api/tests/unit/iam/application/test_group_service.py (5)
src/api/iam/domain/aggregates.py (2)
Group(15-127)User(131-156)src/api/iam/domain/value_objects.py (4)
GroupId(64-70)Role(103-110)TenantId(94-100)UserId(74-80)src/api/iam/ports/repositories.py (1)
IGroupRepository(17-89)src/api/shared_kernel/authorization/protocols.py (1)
AuthorizationProvider(31-185)src/api/iam/application/services/group_service.py (3)
GroupService(23-164)get_group(97-129)delete_group(131-164)
src/api/tests/unit/iam/infrastructure/test_group_repository.py (2)
src/api/iam/domain/value_objects.py (1)
Role(103-110)src/api/shared_kernel/authorization/types.py (1)
SubjectRelation(37-52)
src/api/iam/dependencies.py (6)
src/api/iam/application/value_objects.py (1)
CurrentUser(15-28)src/api/iam/domain/value_objects.py (3)
TenantId(94-100)UserId(74-80)from_string(43-60)src/api/iam/application/services/user_service.py (2)
UserService(15-99)ensure_user(38-99)src/api/iam/infrastructure/group_repository.py (1)
GroupRepository(32-369)src/api/infrastructure/authorization_dependencies.py (1)
get_spicedb_client(18-34)src/api/iam/application/services/group_service.py (1)
GroupService(23-164)
src/api/main.py (4)
src/api/infrastructure/settings.py (4)
cors(176-178)database(166-168)get_cors_settings(209-214)is_enabled(113-115)src/api/infrastructure/database/dependencies.py (2)
close_database_engines(65-79)init_database_engines(31-62)src/api/infrastructure/dependencies.py (1)
get_age_connection_pool(14-23)src/api/infrastructure/logging.py (1)
configure_logging(13-53)
src/api/iam/presentation/routes.py (6)
src/api/iam/application/services/group_service.py (4)
GroupService(23-164)create_group(50-95)get_group(97-129)delete_group(131-164)src/api/iam/application/value_objects.py (1)
CurrentUser(15-28)src/api/iam/dependencies.py (2)
get_current_user(119-165)get_group_service(94-116)src/api/iam/domain/value_objects.py (2)
GroupId(64-70)from_string(43-60)src/api/iam/ports/exceptions.py (1)
DuplicateGroupNameError(9-17)src/api/iam/presentation/models.py (3)
CreateGroupRequest(11-17)GroupResponse(27-56)from_domain(37-56)
src/api/infrastructure/database/dependencies.py (3)
src/api/infrastructure/settings.py (1)
get_database_settings(191-196)src/api/infrastructure/database/engines.py (2)
create_write_engine(29-58)create_read_engine(61-93)src/api/infrastructure/observability/probes.py (2)
pool_closed(60-62)pool_closed(166-171)
src/api/iam/application/services/user_service.py (5)
src/api/iam/application/observability/user_service_probe.py (4)
UserServiceProbe(17-41)DefaultUserServiceProbe(44-95)user_ensured(20-28)user_ensured(65-80)src/api/tests/integration/iam/conftest.py (1)
user_repository(107-109)src/api/iam/domain/value_objects.py (1)
UserId(74-80)src/api/iam/domain/aggregates.py (1)
User(131-156)src/api/iam/infrastructure/user_repository.py (2)
get_by_id(67-88)save(41-65)
src/api/iam/presentation/models.py (2)
src/api/iam/domain/aggregates.py (1)
Group(15-127)src/api/iam/domain/value_objects.py (1)
Role(103-110)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Konflux kflux-prd-rh02 / kartograph-api-on-pull-request
🔇 Additional comments (18)
src/api/tests/unit/iam/infrastructure/test_group_repository.py (2)
194-210: LGTM! Enum-based role checks align with domain model.The replacement of hardcoded
"admin"strings withRole.ADMIN.valueensures test mocks accurately reflect the production code's enum-based role handling. This improves maintainability by centralizing role definitions.
298-306: Consistent enum usage in delete test mock.Same pattern applied correctly in the delete test's
mock_lookupfunction, maintaining consistency across all SpiceDB-related test mocks.src/api/iam/application/services/user_service.py (2)
62-71: Good: Username sync with SSO properly tracked.The logic correctly updates the user when the SSO username differs from the stored username, and properly emits
was_created=False, was_updated=Truefor observability. This is important for tracking username changes from the identity provider.
56-91: Remove the nested transaction concern—the current design prevents it.The architecture explicitly prevents nested transactions. The
get_write_sessiondependency yields fresh sessions that are not in a transaction state, andget_current_usercallsensure_user()without wrapping it in an outer transaction. Each request gets its own session from the sessionmaker via FastAPI's dependency caching. The comment at line 139 ofdependencies.pycorrectly states the service "manages its own transaction," makingasync with self._session.begin()safe and intended.src/api/tests/unit/iam/application/test_group_service.py (2)
211-265: Good coverage for get_group authorization scenarios.The tests properly cover the three key scenarios: authorized access, group not found (skips permission check), and tenant access denied (returns None for security). The security behavior of not leaking group existence when unauthorized is correctly tested.
268-322: Good coverage for delete_group with authorization.Tests correctly verify that unauthorized users receive
PermissionErrorand that the repository delete is not called when unauthorized. The permission check assertions are appropriate.src/api/iam/application/value_objects.py (1)
14-28: Well-designed application-layer value object.The
CurrentUserclass correctly represents the authentication context as an application-layer concept, distinct from the domainUseraggregate. Using a frozen dataclass ensures immutability, which is appropriate for request-scoped auth context.src/api/iam/presentation/models.py (2)
11-17: Good input validation for group creation.The
CreateGroupRequestmodel properly validates thenamefield with sensible constraints (non-empty, max 255 chars). Thedescriptionparameter improves OpenAPI documentation.
36-56: Clean domain-to-DTO mapping.The
from_domainclassmethod provides a clean conversion from the domainGroupaggregate to the API response model, properly extracting member data including roles.src/api/main.py (3)
44-57: Lifespan shutdown runs regardless of MCP state — verify cleanup ordering.Database engine cleanup (line 57) and AGE pool cleanup (lines 60-65) execute after
yieldin both branches. This is correct. However, when_mcp_initializedis True, the MCP context manager's cleanup won't run (since we're in theelsebranch). Ensure this is the intended behavior for test scenarios.
74-84: CORS middleware configuration looks correct.The conditional enablement based on
is_enabledand proper configuration of origins, credentials, methods, and headers follows FastAPI best practices.
91-92: IAM routes properly mounted.The IAM bounded context routes are correctly included in the FastAPI application, following the same pattern as graph routes.
src/api/tests/unit/iam/application/test_user_service.py (2)
14-24: Well-structured mock for async session transactions.The
mock_sessionfixture correctly mocks thebegin()method returning an async context manager with proper__aenter__and__aexit__implementations. This pattern allows tests to verify transaction-scoped behavior.
128-150: Good test coverage for username sync scenario.This test properly verifies that when SSO provides an updated username, the service:
- Updates the user via repository.save()
- Emits the probe with
was_created=False, was_updated=TrueThis ensures observability distinguishes between new users and updated users.
src/api/infrastructure/database/dependencies.py (2)
31-63: LGTM! Excellent lifecycle management pattern.The app.state-based engine initialization ensures engines are created within the running event loop, avoiding async context issues. The pattern correctly separates engine creation from dependency access, and sessionmaker configuration (
expire_on_commit=False) is appropriate for explicit transaction management.
82-156: LGTM! Request-scoped dependency access is well-implemented.The refactored dependencies correctly retrieve engines and sessionmakers from
request.app.state, ensuring proper access to lifecycle-managed resources. The async context managers in session generators guarantee proper cleanup.src/api/iam/dependencies.py (2)
76-116: LGTM! Service factory composition is well-structured.The updated factories correctly wire transaction management (
session), authorization (authz), and observability (probe) dependencies. FastAPI's dependency caching ensures the same session instance is shared acrossrepositoryandservicedependencies, which is essential for transaction consistency.
119-165: Document thatensure_userexceptions will propagate as 500 errors.The docstring for
get_current_userexplicitly documents that invalid tenant IDs raiseHTTPExceptionwith status 400, but doesn't mention thatensure_userfailures (database errors, connection issues) will propagate uncaught and result in 500 responses. For clarity and consistency, add a Raises clause documenting this behavior, since callers may expect more specific error handling given the explicit documentation of the tenant ID validation.
src/api/main.py
Outdated
| # Track MCP initialization state to avoid re-initialization in tests | ||
| _mcp_initialized = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for all usages of _mcp_initialized in the codebase
rg -n "_mcp_initialized" --type pyRepository: openshift-hyperfleet/kartograph
Length of output: 268
🏁 Script executed:
#!/bin/bash
# Find test files to check for fixture setup/teardown
fd -e "test_.*\.py$" -o -e "conftest\.py$" | head -20Repository: openshift-hyperfleet/kartograph
Length of output: 191
🏁 Script executed:
#!/bin/bash
# Get the full context of the lifespan function in main.py
cat -n src/api/main.py | head -100Repository: openshift-hyperfleet/kartograph
Length of output: 3695
🏁 Script executed:
#!/bin/bash
# Find test files in the repository
find . -type f -name "test_*.py" -o -name "*_test.py" -o -name "conftest.py" | head -20Repository: openshift-hyperfleet/kartograph
Length of output: 1160
🏁 Script executed:
#!/bin/bash
# Check if there's a conftest.py that might reset _mcp_initialized
find . -name "conftest.py" -exec cat {} \;Repository: openshift-hyperfleet/kartograph
Length of output: 8124
🏁 Script executed:
#!/bin/bash
# Search for test files that import from src/api/main.py or create FastAPI apps
rg -l "src/api/main|FastAPI|kartograph_lifespan" --type py | grep -i testRepository: openshift-hyperfleet/kartograph
Length of output: 231
🏁 Script executed:
#!/bin/bash
# Search for any references to _mcp_initialized or pytest fixtures in all Python files
rg "pytest|fixture|_mcp_initialized" --type py | grep -E "(fixture|conftest|test_)" | head -30Repository: openshift-hyperfleet/kartograph
Length of output: 3484
🏁 Script executed:
#!/bin/bash
# Search for TestClient usage in tests
rg -n "TestClient" --type py src/api/tests/Repository: openshift-hyperfleet/kartograph
Length of output: 902
🏁 Script executed:
#!/bin/bash
# Search for FastAPI app instantiation in tests
rg -n "FastAPI\(|from src.api.main import|from.*main import app" --type py src/api/tests/Repository: openshift-hyperfleet/kartograph
Length of output: 356
🏁 Script executed:
#!/bin/bash
# Check test_health.py to see how it tests the app
cat -n src/api/tests/test_health.pyRepository: openshift-hyperfleet/kartograph
Length of output: 394
🏁 Script executed:
#!/bin/bash
# Search for pytest fixtures that return FastAPI app
rg -B5 -A10 "def.*app.*:" --type py src/api/tests/integration/conftest.py src/api/tests/unit/conftest.pyRepository: openshift-hyperfleet/kartograph
Length of output: 57
🏁 Script executed:
#!/bin/bash
# Check test_database_dependencies.py in detail
cat -n src/api/tests/unit/infrastructure/test_database_dependencies.pyRepository: openshift-hyperfleet/kartograph
Length of output: 2359
🏁 Script executed:
#!/bin/bash
# Check test_routes.py to see how it creates the app
cat -n src/api/tests/unit/graph/presentation/test_routes.pyRepository: openshift-hyperfleet/kartograph
Length of output: 11420
🏁 Script executed:
#!/bin/bash
# Search for integration test app fixture
cat -n src/api/tests/integration/conftest.py | head -150Repository: openshift-hyperfleet/kartograph
Length of output: 4379
🏁 Script executed:
#!/bin/bash
# Check if test_database_dependencies.py has multiple tests that each create TestClient
grep -n "def test_" src/api/tests/unit/infrastructure/test_database_dependencies.pyRepository: openshift-hyperfleet/kartograph
Length of output: 230
🏁 Script executed:
#!/bin/bash
# Look at how query_mcp_app.lifespan is used and what it manages
rg -n "query_mcp_app|lifespan" --type py src/api/ | grep -E "(import|def lifespan)" | head -20Repository: openshift-hyperfleet/kartograph
Length of output: 228
🏁 Script executed:
#!/bin/bash
# Check if there are integration tests that use the main app
rg -l "from main import app" src/api/tests/integration/ | xargs -I {} head -50 {}Repository: openshift-hyperfleet/kartograph
Length of output: 1588
Global _mcp_initialized flag persists across test lifespans, causing test isolation issues.
The module-level flag is set to True on the first lifespan invocation and never reset. In test_database_dependencies.py, all three test methods create separate TestClient instances of the same app, triggering the lifespan multiple times within a single test run:
- First test: MCP context entered, flag set to
True, resources initialized - Subsequent tests: MCP context skipped, no re-initialization or cleanup
This creates asymmetric behavior where MCP cleanup (lines 48–54) only executes on the first lifespan. If a test fails mid-lifespan or tests run in different orders, the flag state may not reflect actual MCP state, risking:
- MCP not being re-initialized when needed for test isolation
- Resources not being properly cleaned up between tests
- Inconsistent state if early tests fail
Add a pytest fixture to reset this flag before each test, or use a more robust state-tracking mechanism that doesn't rely on persistent module-level globals.
🤖 Prompt for AI Agents
In @src/api/main.py around lines 26 - 27, The module-level flag _mcp_initialized
persists across test lifespans and breaks test isolation; add an autouse pytest
fixture that resets it to False before each test to ensure MCP is re-initialized
per TestClient. Implement a fixture in your tests (e.g., tests/conftest.py) that
imports src.api.main and sets src.api.main._mcp_initialized = False in a
function-scoped autouse fixture, or alternatively refactor the flag into a
per-app state (e.g., app.state._mcp_initialized) and clear/initialize that in
the app lifespan handler so the MCP init/cleanup (the lifespan code that
currently checks _mcp_initialized) runs reliably between tests.
Replace module-level _mcp_initialized flag with per-app state (app.state._mcp_initialized) to maintain test isolation. Why this is better: - Module-level state persists across test lifespans, breaking isolation - Each FastAPI app instance gets its own state automatically - New test clients get fresh state without fixture boilerplate - More Pythonic - state lives with the object, not at module level Before (breaks test isolation): - _mcp_initialized = False # Module-level, shared across all tests - Once True, stays True for all subsequent tests After (proper isolation): - app.state._mcp_initialized # Per-app instance - Each new app (e.g., TestClient) gets fresh state Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>
…-hyperfleet/kartograph into AIHCM-128-iam-fastapi-endpoint
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
src/api/main.py (3)
42-56: Consider simplifying the MCP initialization check.The current pattern separates the attribute existence check from the value check. This could be streamlined using
getattr()with a default value for improved readability.♻️ Proposed simplification
- # Initialize MCP state tracking on this app instance - if not hasattr(app.state, "_mcp_initialized"): - app.state._mcp_initialized = False - - # Startup: initialize database engines init_database_engines(app) - # MCP lifespan - skip if already initialized (e.g., in tests with multiple lifespans) - if not app.state._mcp_initialized: + # MCP lifespan - skip if already initialized (preserves test isolation) + if not getattr(app.state, "_mcp_initialized", False): async with query_mcp_app.lifespan(app): app.state._mcp_initialized = True yield else: - # MCP already initialized in previous lifespan cycle yield
58-59: Add error handling to database engine cleanup for consistency.The AGE connection pool cleanup (lines 62-67) includes error handling, but the database engine cleanup doesn't. For consistency and to ensure all cleanup steps are attempted even if one fails, consider adding a try/except block here.
♻️ Proposed error handling
# Shutdown: close database engines - await close_database_engines(app) + try: + await close_database_engines(app) + except Exception: + # Engines may not be initialized or cleanup may fail, continue with other cleanup + pass # Shutdown: close AGE connection pool
28-28: Consider adding explicit return type annotation.While the
@asynccontextmanagerdecorator infers the return type, adding an explicit type hint improves code clarity and IDE support.♻️ Proposed type hint
+from typing import AsyncGenerator + @asynccontextmanager -async def kartograph_lifespan(app: FastAPI): +async def kartograph_lifespan(app: FastAPI) -> AsyncGenerator[None, None]: """Application lifespan context.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/api/main.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Use clear, descriptive names for agent functions and methods
Document agent capabilities and behavior in docstrings
Include type hints in agent function signatures
Implement error handling and validation for agent inputsImplement logging following Domain Oriented Observability pattern, preferring Domain Probes over logger.* and print() calls
Files:
src/api/main.py
src/api/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use FastAPI for API development in the /src/api directory
Files:
src/api/main.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Konflux kflux-prd-rh02 / kartograph-api-on-pull-request
🔇 Additional comments (3)
src/api/main.py (3)
23-24: LGTM! Logging configuration is correctly placed.Module-level logging configuration ensures structlog is properly initialized before any application loggers are created, which is the correct pattern.
76-86: LGTM! CORS configuration is well-structured.The conditional CORS middleware setup based on settings is clean and follows best practices. Middleware is correctly added after app creation and will apply to all routes.
93-95: LGTM! IAM routes properly integrated.The IAM router inclusion follows the established pattern and correctly integrates the new IAM bounded context endpoints.
Summary
Implements IAM bounded context FastAPI endpoints with full integration testing against PostgreSQL and SpiceDB.
/iam/groupsendpoints with tenant isolationX-User-Id,X-Username,X-Tenant-Id)Closes: AIHCM-128
Test plan
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Improvements
✏️ Tip: You can customize this high-level summary in your review settings.