Add MCP server with OAuth 2.1 for Claude.AI integration by manana2520 · Pull Request #22 · keboola/headless-knowledge-base

manana2520 · 2026-02-22T21:38:01Z

Summary

Add a remote MCP server that exposes the knowledge base to Claude.AI users via Streamable HTTP transport with OAuth 2.1 (Google) authentication
Extract core Q&A logic (search_knowledge, generate_answer) from Slack bot into reusable core/qa.py module
6 MCP tools: ask_question, search_knowledge, create_knowledge, ingest_document, submit_feedback, check_health
Scope-based access control: @keboola.com users get kb.read + kb.write, external users get kb.read only
Separate Docker image (Dockerfile.mcp) and Cloud Run services for staging (kb-mcp-staging) and production (kb-mcp)
CI/CD: build-mcp -> deploy-mcp-staging -> e2e tests -> deploy-mcp-production
64 unit tests covering tools, server protocol, and OAuth logic

User Interaction Scenarios

Scenario	MCP Tool	Example
Ask a question	`ask_question`	"What is Keboola's onboarding process?"
Search documents	`search_knowledge`	"Find docs about deployment pipelines"
Create knowledge	`create_knowledge`	"Remember that staging DB was rotated"
Ingest external doc	`ingest_document`	"Add this page: https://..."
Give feedback	`submit_feedback`	"Mark chunk X as outdated"
Check health	`check_health`	"Is the knowledge base healthy?"

Setup Required

After merging, create a Google OAuth Client ID in GCP Console:

APIs & Services > Credentials > Create Credentials > OAuth client ID
Name: Knowledge Base MCP Server, type: Web application
Store Client ID and Secret in Secret Manager (mcp-oauth-client-id, mcp-oauth-client-secret)
Run terraform apply to create Cloud Run services

Test plan

64 unit tests pass (tools, server, OAuth)
Create Google OAuth Client ID in GCP Console
Deploy to staging with terraform apply
Verify health check: curl https://kb-mcp-staging-<hash>.run.app/health
Test MCP protocol with dev mode locally
Add MCP server in Claude.AI settings and verify search works

Expose the knowledge base as a remote MCP server so Claude.AI users can search, ask questions, create knowledge, ingest documents, and submit feedback directly from Claude. Key changes: - New src/knowledge_base/mcp/ package with OAuth 2.1 (Google), Streamable HTTP transport, 6 MCP tools, and scope-based access control - Extract core Q&A logic from Slack bot into src/knowledge_base/core/qa.py for reuse across interfaces (Slack bot now delegates to core) - OAuth module adapted from odoo-mcp-server with kb.read/kb.write scopes (@keboola.com gets write, external users get read-only) - Dockerfile.mcp and Terraform for both staging and production Cloud Run - CI/CD pipeline: build-mcp, deploy-mcp-staging, deploy-mcp-production - 64 unit tests covering tools, server protocol, and OAuth logic

github-actions

LGTM! This is a well-engineered and comprehensive PR that successfully implements the MCP server integration with Claude.AI, including robust OAuth 2.1 authentication and a well-thought-out tool ecosystem. The core Q&A logic extraction is an excellent architectural improvement, promoting reusability and maintainability.

Key strengths observed:

Architectural Clarity: The refactoring of core Q&A logic into knowledge_base/core/qa.py is a significant win, correctly isolating reusable components from interface-specific code (Slack, now MCP).
Robust OAuth Implementation: The oauth/ module provides a detailed and secure implementation for OAuth 2.1, handling Google-specific nuances (ID tokens vs. opaque access tokens, JWKS fetching, client_id as audience, azp verification). The extract_user_context logic for deriving kb.read and kb.write scopes based on email domain and verification is a smart approach for internal authorization with Google OAuth.
Comprehensive MCP Tools: Six distinct tools are defined with clear descriptions and input schemas, covering the core functionalities of the knowledge base. The execute_tool dispatcher is well-structured.
Thorough Testing: The unit tests for MCP tools, server endpoints, and OAuth logic (especially extract_user_context and check_scope_access) demonstrate excellent coverage, including various scenarios and edge cases.
Sound Terraform Deployment: The cloudrun-mcp.tf file correctly sets up production and staging Cloud Run services, service accounts, and Secret Manager integration. Crucially, the lifecycle { ignore_changes = [secret_data] } for OAuth secrets is correctly applied, preventing accidental overwrites. IAM roles are appropriate, and VPC Access is correctly configured for Neo4j connectivity.
Security Considerations: MCP_DEV_MODE is explicitly false in Terraform, and the in-application OAuth validation is the primary security boundary, with allUsers invoker role on Cloud Run being a necessary pattern for this type of integration. CORS allow_origins=["*"] is acceptable given the robust token validation applied to all /mcp endpoints.

No significant issues or bugs were found. This work is high quality and aligns perfectly with the project's goals and technical standards.

github-actions

🔒 Security Review

Security Score: D | 5 CRITICAL, 3 HIGH, 1 INFO, 2 LOW, 6 MEDIUM

Security Review Summary: MCP Server OAuth 2.1 Integration

Critical Issues (5 findings)

The PR introduces 5 critical logging violations that expose sensitive information:

Token validation exception details logged in resource_server.py at lines 226 and 229
User email addresses (PII) logged in token_validator.py at lines 156 and 244
Exception details logged in server.py at line 137

All of these violate ADR-001 logging standards and expose user identity information or OAuth implementation details to log consumers.

High Severity Issues (4 findings)

Dev mode authentication bypass (server.py:143) - MCP_DEV_MODE=true disables all OAuth validation without production safeguards
Email domain-based access control (resource_server.py:211) - Write scope granted purely on @keboola.com email domain without additional verification
External Google API calls without caching (token_validator.py:168) - Synchronous tokeninfo validation has no caching or rate limiting
Missing production safety checks (server.py) - No explicit validation preventing dev mode in production

Medium Severity Issues (6 findings)

Unsanitized user email in metadata (tools.py:371) - Email used directly in Neo4j chunks without sanitization
Unvalidated search queries (tools.py:413) - Search input not validated before forwarding to backend
Plaintext credential patterns in environment variables (though correctly using Secret Manager in Terraform)
Overly permissive CORS (server.py:149) - Wildcard origin with credentials flag
Undocumented test user email env var (server.py:166)

Accepted Patterns (Correctly Implemented)

✓ OAuth configuration via pydantic-settings (no hardcoding)
✓ Secret Manager integration in Terraform for MCP_OAUTH_* secrets
✓ Scope-based tool access control structure (though implementation needs hardening)
✓ RFC 9728 Protected Resource Metadata endpoint

Recommendations Before Merge

Remove all email/exception details from logs - Use generic error messages or hashed identifiers
Add production safety checks - Raise error if dev_mode=true in production environment
Harden access control - Implement role-based access beyond email domain matching
Cache Google token validation - Implement Redis caching with TTL for tokeninfo results
Sanitize user input - Validate/sanitize email, search queries, and all user-supplied data
Restrict CORS - Remove wildcard origin or set specific Claude.AI domains

Score Justification: D

The presence of multiple HIGH severity findings (dev mode bypass, domain-based access control, external API calls without caching) combined with 5 CRITICAL logging issues that violate the established security patterns warrants a D grade. While no absolute authentication bypass exists, the combination of issues creates exploitable attack surface.

Findings

🔴 [CRITICAL] Exception details logged in logger.warning() may expose TokenValidationError details including token claims or validation failures

File: src/knowledge_base/mcp/oauth/resource_server.py:line 226
Category: A09_logging_monitoring
Impact: Sensitive token validation errors could be exposed in logs, potentially revealing information about token structure or validation logic to attackers with log access
Recommendation: Log only error type and generic message without exception details. Use logger.warning(f'Token validation failed: {type(e).name}') without the full exception object

🔴 [CRITICAL] Exception details logged in logger.error() during token validation without sanitization

File: src/knowledge_base/mcp/oauth/resource_server.py:line 229
Category: A09_logging_monitoring
Impact: Full exception stack traces and validation details could be exposed in error logs, potentially revealing sensitive information about OAuth implementation
Recommendation: Log only generic error message without exception details: logger.error('Unexpected error during token validation')

🔴 [CRITICAL] Email address logged in logger.warning() when email_verified is False

File: src/knowledge_base/mcp/oauth/token_validator.py:line 156
Category: A09_logging_monitoring
Impact: User email addresses are PII and should not be logged at warning level. This logs unverified emails which could be sensitive user data
Recommendation: Log only the fact that email verification failed without including the email address: logger.warning('Google token email not verified')

🔴 [CRITICAL] Email address logged in logger.info() during token validation

File: src/knowledge_base/mcp/oauth/token_validator.py:line 244
Category: A09_logging_monitoring
Impact: User email addresses are PII and logging them at info level violates privacy best practices. This could expose user identity information in application logs
Recommendation: Log only the fact that token was validated without including email: logger.info('Google access token validated')

🔴 [CRITICAL] Exception details including full exception object logged in logger.warning() during token validation

File: src/knowledge_base/mcp/server.py:line 137
Category: A09_logging_monitoring
Impact: Token validation errors could expose sensitive information about the OAuth implementation, token structure, or validation failures to anyone with log access
Recommendation: Log only the exception type and generic message: logger.warning(f'Token validation failed: {type(e).name}')

🟠 [HIGH] Dev mode disables OAuth token validation entirely when MCP_DEV_MODE=true

File: src/knowledge_base/mcp/server.py:line 143
Category: A07_auth_failures
Impact: If MCP_DEV_MODE is accidentally left enabled in production, all OAuth authentication is bypassed, allowing any user to access all tools with full scopes
Recommendation: Add explicit validation that MCP_DEV_MODE is False in production. Ensure this setting cannot be enabled via environment variable without explicit security review. Add startup logging warning if dev mode is enabled

🟠 [HIGH] Scope-based access control logic grants write access based on email domain (@keboola.com) without additional verification

File: src/knowledge_base/mcp/oauth/resource_server.py:line 211
Category: A01_access_control
Impact: Any user who can obtain a Google account with @keboola.com domain gets write access to knowledge base. If domain is compromised or external account gains domain email, they get write access without additional authorization
Recommendation: Implement additional verification for write scope beyond email domain: validate user against internal LDAP/directory, require explicit role assignment, or use additional claims from ID token

🟠 [HIGH] Google opaque access token validation uses external Google API call (tokeninfo endpoint) without caching or rate limiting

File: src/knowledge_base/mcp/oauth/token_validator.py:line 168
Category: A04_insecure_design
Impact: Each token validation makes a synchronous HTTP call to Google, which could be abused for DOS attacks or expose validation patterns to Google's infrastructure
Recommendation: Implement token result caching (Redis or in-memory with TTL), add rate limiting per user/client, or switch to validating only JWT ID tokens which can be verified offline

🟡 [MEDIUM] MCP_OAUTH_CLIENT_SECRET has empty string default while marked as optional

File: src/knowledge_base/mcp/config.py:line 18
Category: A05_misconfiguration
Impact: Code may reference this value assuming it's populated, leading to authentication failures or fallback to insecure behavior. Empty defaults can hide configuration issues
Recommendation: Either make this field required (no default) or add validation that checks if client_secret is actually needed and present before using it

🟡 [MEDIUM] User email from token claims used directly in chunk metadata and page title without sanitization

File: src/knowledge_base/mcp/tools.py:line 371
Category: A08_integrity_failures
Impact: Malicious email addresses could inject newlines, special characters, or SQL-like syntax into metadata that gets stored in Neo4j, potentially causing injection vulnerabilities downstream
Recommendation: Sanitize email address before using in metadata: validate format, remove special characters, or use a hashed user ID instead

🟡 [MEDIUM] User email logged in tool execution logging without rate limiting or audit trail

File: src/knowledge_base/mcp/tools.py:line 297
Category: A09_logging_monitoring
Impact: Repeated logging of user emails at info level creates audit trail that could reveal user patterns and access history
Recommendation: Log user action at debug level only, or use hashed user identifier. Implement proper audit logging separate from application logs

🟡 [MEDIUM] NEO4J_PASSWORD stored as plaintext environment variable in Cloud Run service definition

File: deploy/terraform/cloudrun-mcp.tf:line 153
Category: A05_misconfiguration
Impact: Cloud Run environment variables are visible in GCP Console and Terraform state. If Terraform state is compromised, Neo4j credentials are exposed
Recommendation: Already correctly using Secret Manager (google_secret_manager_secret_version) - good pattern. Ensure tfvars files and Terraform state are not checked into version control

🟡 [MEDIUM] search_knowledge query parameter used directly in search without input validation or sanitization

File: src/knowledge_base/mcp/tools.py:line 413
Category: A03_injection
Impact: Malicious search queries could potentially exploit search backend (Neo4j/Graphiti) if they accept injection vectors. Query operators or special syntax could be abused
Recommendation: Validate search query: limit length (e.g., max 500 chars), escape special characters, or use parameterized search API if available

🟡 [MEDIUM] Dev mode bypass is visible in production code without clear enforcement mechanism to prevent accidental enablement

File: src/knowledge_base/mcp/oauth/resource_server.py:line 93
Category: A04_insecure_design
Impact: Environment variable MCP_DEV_MODE=true disables all OAuth. If automation scripts or monitoring tools set this for testing, production could run in dev mode
Recommendation: Add explicit check: raise startup error if MCP_DEV_MODE=true and ENVIRONMENT=production. Add prominent startup logging warning

🟢 [LOW] CORS middleware allows origins=['*'] with allow_credentials=True

File: src/knowledge_base/mcp/server.py:line 149
Category: A05_misconfiguration
Impact: This combination is technically not harmful (browsers won't allow credentials with wildcard origin) but indicates overly permissive CORS configuration
Recommendation: Restrict CORS to specific origins (e.g., Claude.AI domains) or remove allow_credentials if not needed

🟢 [LOW] TEST_USER_EMAIL environment variable used in dev mode without documentation

File: src/knowledge_base/mcp/server.py:line 166
Category: A09_logging_monitoring
Impact: Undocumented env var could be accidentally set in production, causing all requests to be attributed to the test email
Recommendation: Document this env var clearly, validate it's only used when MCP_DEV_MODE=true, or use a hardcoded dev email

🔵 [INFO] Google ID token validation does not verify azp (authorized party) claim is present as part of required claims

File: src/knowledge_base/mcp/oauth/token_validator.py:line 153
Category: A07_auth_failures
Impact: azp claim is checked only if present in token, but best practice is to require it for Google tokens to prevent token substitution attacks
Recommendation: Add 'azp' to the required claims list if Google OAuth is used exclusively, or add conditional requirement based on token issuer

OWASP Top 10 Checklist

Category	Status
A01 Access Control	❌ FAIL
A02 Crypto Failures	✅ PASS
A03 Injection	❌ FAIL
A04 Insecure Design	❌ FAIL
A05 Misconfiguration	❌ FAIL
A06 Vulnerable Components	➖ N/A
A07 Auth Failures	❌ FAIL
A08 Integrity Failures	❌ FAIL
A09 Logging Monitoring	❌ FAIL
A10 Ssrf	➖ N/A

🤖 Security review powered by Claude

MCP spec requires the server to act as OAuth AS (or proxy to one). Claude.AI discovers /authorize, /token, /register endpoints and uses them for the OAuth flow. - GET /.well-known/oauth-authorization-server: RFC 8414 metadata - GET /authorize: redirects to Google OAuth, maps scopes - POST /token: proxies token exchange to Google - POST /register: dynamic client registration (RFC 7591) - MCP_OAUTH_CLIENT_SECRET now required (needed for token exchange) - 10 new unit tests for all OAuth AS endpoints

Cloud Run terminates TLS at the load balancer, so FastAPI sees HTTP. Use X-Forwarded-Proto header to construct correct external URLs.

github-actions

LGTM! This PR introduces a robust MCP (Managed Component Protocol) server with OAuth 2.1 integration for Claude.AI, which is a significant step forward for the AI Knowledge Base system. The changes are well-structured, follow architectural patterns, and address critical security considerations.

Key strengths observed:

Intent Fulfillment: The PR description accurately reflects the implemented features, including the new MCP server, OAuth 2.1 (Google) authentication, tool definitions, scope-based access control, and dedicated Cloud Run services for staging and production.
Architectural Alignment: The extraction of core Q&A logic into core/qa.py is an excellent refactoring, promoting reusability and maintainability. The use of async Python, Graphiti/Neo4j, and Google Cloud services (Cloud Run, Secret Manager) is consistent with the project's established patterns.
Security: The OAuth implementation is sound, correctly proxying to Google, handling scope mapping, and enforcing domain-based access control (@keboola.com for write access) within the extract_user_context logic. The use of Secret Manager for OAuth client credentials in Terraform is highly commendable, avoiding hardcoded secrets and ensuring proper IAM access for the Cloud Run service account. The allUsers invoker role on Cloud Run is appropriate as authentication is handled at the application level.
Terraform Quality: The cloudrun-mcp.tf file is well-written, with clear resource naming, appropriate scaling configurations for staging and production, correct image tagging, and thoughtful environment variable management. The ignore_changes = [secret_data] lifecycle block for secrets is crucial and correctly applied.
Test Coverage: The new unit tests (test_mcp_tools.py, test_mcp_server.py, test_mcp_oauth.py) demonstrate thorough coverage of tool functionality, server endpoints, and OAuth flows, including dev mode and scope filtering.

No significant bugs or architectural misalignments were found. The code is clean, well-commented where necessary, and demonstrates a high standard of engineering.

github-actions

🔒 Security Review

Security Score: D | 5 CRITICAL, 3 HIGH, 2 LOW, 5 MEDIUM

Security Review Summary: MCP Server OAuth Integration

Decision: REQUEST_CHANGES

Security Score: D (HIGH severity findings present)

Critical Issues (5 findings)

Credential and PII Logging: The OAuth token validator and resource server log user email addresses and exception details at info/warning levels, directly violating the security context requirement that user PII should not be logged at these levels. Exception messages in lines 226, 229 (resource_server.py) and 156, 244 (token_validator.py) may contain sensitive information.

Access Control Bypass: Dev mode (line 145, resource_server.py) completely bypasses OAuth validation if enabled, granting full access without authentication. The mode is controlled via environment variable with no production safeguard.

High Severity Issues (4 findings)

Email Verification Bypass: Token validation at line 130 (token_validator.py) logs a warning but accepts unverified emails, conflicting with security intent.

Terraform Configuration: Secret placeholders (cloudrun-mcp.tf) could be deployed if not manually updated, causing authentication failures.

CORS Misconfiguration: Wildcard CORS allows any origin to query metadata endpoints, aiding reconnaissance.

Dev Mode Logging: Test user configuration is logged at info level in development mode.

Medium Severity Issues (2 findings)

User Identity Handling: User email is used in URLs and logs without sanitization (tools.py, server.py).

OAuth Client ID Exposure: Client ID logged at startup, potentially aiding token forgery attacks.

Recommendations

Immediately remove all email/PII logging from warning and info levels in token validation code. Use debug level only or sanitized identifiers (hashes).
Add production safeguard for dev mode with runtime validation that prevents enabling dev mode outside development environments.
Enforce email verification in token validation by raising exceptions for unverified emails, not just logging warnings.
Restrict CORS origins to known Claude.AI domains instead of wildcard.
Validate secret initialization in Terraform or require pre-creation of secrets outside IaC.
Use sub claim instead of email for audit logging to reduce PII in logs.
Sanitize user-supplied data (email addresses) when used in URIs or returned to clients.

The MCP server implementation shows good foundational security (proper OAuth validation, secret management via Cloud Secret Manager), but logging practices need immediate remediation to comply with the organization's data protection requirements.

Findings

🔴 [CRITICAL] Email addresses (PII) logged at warning level in token validation error handler: `logger.warning(f"Token validation failed: {e}")`

File: src/knowledge_base/mcp/oauth/resource_server.py:line 226
Category: A09 Logging & Monitoring
Impact: Exception messages may contain user email addresses and token validation errors, exposing PII in logs. This violates the security requirement that Slack user PII and email addresses should not be leaked in logs.
Recommendation: Log only the error type, not the full exception message which may contain PII. Use: logger.warning(f"Token validation failed: {type(e).__name__}") or sanitize the error message to exclude email/identifying information.

🔴 [CRITICAL] Full exception details logged at error level in token validation handler: `logger.error(f"Unexpected error during token validation: {e}")`

File: src/knowledge_base/mcp/oauth/resource_server.py:line 229
Category: A09 Logging & Monitoring
Impact: Unvalidated exception details may contain sensitive information including token content, email addresses, or internal system details. This can expose credentials or PII in error logs.
Recommendation: Log only the exception type and sanitized message without full exception details. Use: logger.error(f"Unexpected error during token validation: {type(e).__name__}") to avoid exposing sensitive data in stack traces.

🔴 [CRITICAL] User email address (PII) logged at warning level when email not verified: `logger.warning(f"Google token email not verified: {claims.get('email')}`

File: src/knowledge_base/mcp/oauth/token_validator.py:line 156
Category: A09 Logging & Monitoring
Impact: Directly logs user email addresses in application logs at warning level. This violates the security policy that email addresses should not be logged at info level or above.
Recommendation: Remove the email from the log message or hash it. Use: logger.debug(f"Google token email verification failed") if logging is needed, and move this to debug level only.

🔴 [CRITICAL] User email address (PII) logged at info level during token validation: `logger.info(f"Google access token validated for: {normalized.get('email')}`

File: src/knowledge_base/mcp/oauth/token_validator.py:line 244
Category: A09 Logging & Monitoring
Impact: Logs user email addresses at INFO level in normal operation. This violates the security context requirement that email addresses should not be logged at info level or higher.
Recommendation: Remove email from the log or use debug level. Use: logger.debug(f"Google access token validated") instead, or if logging at info level is needed, use a hash: logger.info(f"Google access token validated for: {hashlib.sha256(email.encode()).hexdigest()[:8]}").

🔴 [CRITICAL] Full exception details logged at warning level during token validation: `logger.warning(f"Token validation failed: {type(e).name}: {e}")`

File: src/knowledge_base/mcp/server.py:line 149
Category: A09 Logging & Monitoring
Impact: Exception message {e} may contain sensitive token information, email addresses, or internal system details. This exposes PII and credentials in warning-level logs.
Recommendation: Log only the exception type without the full exception message. Use: logger.warning(f"Token validation failed: {type(e).__name__}") to avoid exposing details in logs.

🟠 [HIGH] Informational logging of user email for scope decisions: Multiple logs at info level log email addresses when granting scopes. Line 205-206: `logger.info(f"Google OAuth: granted default scopes for {email}")` and `logger.info(f"Google OAuth: granted write scope for internal user {email}")`

File: src/knowledge_base/mcp/oauth/resource_server.py:line 219
Category: A09 Logging & Monitoring
Impact: While these are at info level (acceptable per security context for normal operations), they still log PII during normal operation which could accumulate in log storage and be searchable.
Recommendation: Consider moving to debug level only, or if info level is required for audit purposes, use a hashed identifier instead of plain email: hashlib.sha256(email.encode()).hexdigest()[:8].

🟠 [HIGH] Dev mode bypasses all OAuth token validation: When `dev_mode=True`, the middleware skips token validation entirely and grants all scopes without verification.

File: src/knowledge_base/mcp/oauth/resource_server.py:line 145
Category: A01 Broken Access Control
Impact: If dev mode is accidentally enabled in production (via environment variable), all access control is completely bypassed. Any unauthenticated request will be granted full read/write access to the knowledge base.
Recommendation: Add runtime validation to ensure dev mode is never enabled in production. In the middleware init or at application startup, check: if self.dev_mode and not os.getenv('ENVIRONMENT') == 'development': raise RuntimeError('Dev mode is only allowed in development')

🟠 [HIGH] MCP OAuth secrets use placeholder values 'REPLACE_ME' in Terraform with `lifecycle { ignore_changes = [secret_data] }`. This means if the secret is never actually populated, the placeholder could be deployed.

File: deploy/terraform/cloudrun-mcp.tf
Category: A04 Insecure Design
Impact: If operators forget to manually set the secret values in GCP Secret Manager, the MCP server will attempt to use 'REPLACE_ME' as the client secret, causing authentication failures and potentially exposing the secret value in error messages.
Recommendation: Use sensitive = true on the secret versions and add a precondition check that validates secrets are not placeholder values at deployment time. Alternatively, require secrets to be pre-created outside of Terraform and reference them only.

🟡 [MEDIUM] User email address from OAuth context is used directly in `url` metadata without validation: `url=f"mcp://user/{user_email}"`. User email is extracted from claims without additional validation.

File: src/knowledge_base/mcp/tools.py:line 316
Category: A08 Data Integrity Failures
Impact: If a user email contains special characters or is malformed, it could create invalid URIs or be used in injection attacks when the URL metadata is later processed. Additionally, the user identity is assumed to be from the email claim without verifying claim authenticity.
Recommendation: Validate and sanitize user email before using in URL construction. Use: from urllib.parse import quote; url=f"mcp://user/{quote(user_email)}"

🟡 [MEDIUM] MCP server logs OAuth configuration including the audience (client_id) at startup: `logger.info(f"OAuth audience: {_get_oauth_audience()}")`

File: src/knowledge_base/mcp/server.py:line 81
Category: A09 Logging & Monitoring
Impact: The Google OAuth client ID is logged at startup, which could be captured in server logs and used by attackers to forge tokens or perform OAuth flow attacks. The client ID is sensitive when combined with knowledge of the resource server.
Recommendation: Do not log the client ID directly. Use: logger.info("OAuth audience configured") without the actual value. Log only hashes or truncated values if auditing is required.

🟡 [MEDIUM] CORS middleware configured with `allow_origins=["*"]` (all origins allowed) without additional protection on sensitive endpoints.

File: src/knowledge_base/mcp/server.py:line 210
Category: A05 Security Misconfiguration
Impact: Any website can make cross-origin requests to the MCP server's endpoints. While the OAuth endpoints require authentication via Bearer token, the metadata endpoints are public and CORS allows them to be queried from any origin. This could aid reconnaissance attacks.
Recommendation: Restrict CORS origins to known Claude.AI domains or remove wildcard. Use: allow_origins=["https://claude.ai", "https://*.claude.ai"] or implement stricter origin validation for metadata endpoints.

🟡 [MEDIUM] Email verification check is logged as warning but does NOT prevent token validation from succeeding: `if not claims.get("email_verified", False): logger.warning(...); return claims`

File: src/knowledge_base/mcp/oauth/token_validator.py:line 130
Category: A07 Authentication Failures
Impact: Tokens with email_verified: false are still accepted and validated. A user with an unverified Google email could potentially gain access to the knowledge base, contradicting the security intent.
Recommendation: Either raise an exception when email is not verified (raise TokenValidationError("Email not verified")) OR change the warning to debug level if this is intentional. The current behavior conflates logging with enforcement.

🟡 [MEDIUM] In dev mode, the test user email is logged: `logger.info(f"MCP dev mode: using email {dev_email}`)

File: src/knowledge_base/mcp/server.py:line 165
Category: A09 Logging & Monitoring
Impact: Dev mode email is logged at info level. While this is development mode, if logs are inadvertently preserved or exported, the dev email could be exposed.
Recommendation: Move to debug level: logger.debug(f"MCP dev mode enabled") without logging the email, or hash it.

🟢 [LOW] MCP configuration requires `MCP_OAUTH_CLIENT_ID` and `MCP_OAUTH_CLIENT_SECRET` with no defaults, but no pydantic validation error message is provided to guide deployment.

File: src/knowledge_base/mcp/config.py
Category: A05 Security Misconfiguration
Impact: If these environment variables are missing, pydantic will raise a generic validation error. While this fails safe (don't start without credentials), the error message could be clearer to help operators diagnose configuration issues.
Recommendation: Add pydantic field descriptions and validators with helpful error messages: MCP_OAUTH_CLIENT_ID: str = Field(..., description="Google OAuth Client ID - obtain from GCP Console")

🟢 [LOW] Tool execution logs the tool name and user email: `logger.info(f"Executing tool: {tool_name}, user: {user.get('email', 'unknown')}")`

File: src/knowledge_base/mcp/tools.py:line 165
Category: A09 Logging & Monitoring
Impact: User emails are logged at info level when tools are executed. This creates an audit trail of user actions, which is generally desirable for audit purposes, but accumulates PII in logs.
Recommendation: Use user sub claim (subject identifier) instead of email for audit logs: logger.info(f"Executing tool: {tool_name}, user: {user.get('sub', 'unknown')}")

OWASP Top 10 Checklist

Category	Status
A01 Access Control	❌ FAIL
A02 Crypto Failures	✅ PASS
A03 Injection	✅ PASS
A04 Insecure Design	❌ FAIL
A05 Misconfiguration	❌ FAIL
A06 Vulnerable Components	✅ PASS
A07 Auth Failures	❌ FAIL
A08 Integrity Failures	❌ FAIL
A09 Logging Monitoring	❌ FAIL
A10 Ssrf	✅ PASS

🤖 Security review powered by Claude

github-actions bot approved these changes Feb 22, 2026

View reviewed changes

github-actions bot requested changes Feb 22, 2026

View reviewed changes

Gemini Agent added 2 commits February 23, 2026 22:08

Fix OAuth AS metadata to use https behind reverse proxy

12fd5ad

Cloud Run terminates TLS at the load balancer, so FastAPI sees HTTP. Use X-Forwarded-Proto header to construct correct external URLs.

github-actions bot approved these changes Feb 23, 2026

View reviewed changes

github-actions bot requested changes Feb 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Add MCP server with OAuth 2.1 for Claude.AI integration#22

Add MCP server with OAuth 2.1 for Claude.AI integration#22
manana2520 wants to merge 3 commits intomainfrom
feature/mcp-server

manana2520 commented Feb 22, 2026

Uh oh!

github-actions bot left a comment

Uh oh!

github-actions bot left a comment

Uh oh!

github-actions bot left a comment

Uh oh!

github-actions bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

manana2520 commented Feb 22, 2026

Summary

User Interaction Scenarios

Setup Required

Test plan

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

🔒 Security Review

Security Review Summary: MCP Server OAuth 2.1 Integration

Critical Issues (5 findings)

High Severity Issues (4 findings)

Medium Severity Issues (6 findings)

Accepted Patterns (Correctly Implemented)

Recommendations Before Merge

Score Justification: D

Findings

🔴 [CRITICAL] Exception details logged in logger.warning() may expose TokenValidationError details including token claims or validation failures

🔴 [CRITICAL] Exception details logged in logger.error() during token validation without sanitization

🔴 [CRITICAL] Email address logged in logger.warning() when email_verified is False

🔴 [CRITICAL] Email address logged in logger.info() during token validation

🔴 [CRITICAL] Exception details including full exception object logged in logger.warning() during token validation

🟠 [HIGH] Dev mode disables OAuth token validation entirely when MCP_DEV_MODE=true

🟠 [HIGH] Scope-based access control logic grants write access based on email domain (@keboola.com) without additional verification

🟠 [HIGH] Google opaque access token validation uses external Google API call (tokeninfo endpoint) without caching or rate limiting

🟡 [MEDIUM] MCP_OAUTH_CLIENT_SECRET has empty string default while marked as optional

🟡 [MEDIUM] User email from token claims used directly in chunk metadata and page title without sanitization

🟡 [MEDIUM] User email logged in tool execution logging without rate limiting or audit trail

🟡 [MEDIUM] NEO4J_PASSWORD stored as plaintext environment variable in Cloud Run service definition

🟡 [MEDIUM] search_knowledge query parameter used directly in search without input validation or sanitization

🟡 [MEDIUM] Dev mode bypass is visible in production code without clear enforcement mechanism to prevent accidental enablement

🟢 [LOW] CORS middleware allows origins=['*'] with allow_credentials=True

🟢 [LOW] TEST_USER_EMAIL environment variable used in dev mode without documentation

🔵 [INFO] Google ID token validation does not verify azp (authorized party) claim is present as part of required claims

OWASP Top 10 Checklist

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

🔒 Security Review

Security Review Summary: MCP Server OAuth Integration

Decision: REQUEST_CHANGES

Critical Issues (5 findings)

High Severity Issues (4 findings)

Medium Severity Issues (2 findings)

Recommendations

Findings

🔴 [CRITICAL] Email addresses (PII) logged at warning level in token validation error handler: logger.warning(f"Token validation failed: {e}")

🔴 [CRITICAL] Full exception details logged at error level in token validation handler: logger.error(f"Unexpected error during token validation: {e}")

🔴 [CRITICAL] User email address (PII) logged at warning level when email not verified: logger.warning(f"Google token email not verified: {claims.get('email')}

🔴 [CRITICAL] User email address (PII) logged at info level during token validation: logger.info(f"Google access token validated for: {normalized.get('email')}

🔴 [CRITICAL] Full exception details logged at warning level during token validation: logger.warning(f"Token validation failed: {type(e).__name__}: {e}")

🟠 [HIGH] Informational logging of user email for scope decisions: Multiple logs at info level log email addresses when granting scopes. Line 205-206: logger.info(f"Google OAuth: granted default scopes for {email}") and logger.info(f"Google OAuth: granted write scope for internal user {email}")

🟠 [HIGH] Dev mode bypasses all OAuth token validation: When dev_mode=True, the middleware skips token validation entirely and grants all scopes without verification.

🟠 [HIGH] MCP OAuth secrets use placeholder values 'REPLACE_ME' in Terraform with lifecycle { ignore_changes = [secret_data] }. This means if the secret is never actually populated, the placeholder could be deployed.

🟡 [MEDIUM] User email address from OAuth context is used directly in url metadata without validation: url=f"mcp://user/{user_email}". User email is extracted from claims without additional validation.

🟡 [MEDIUM] MCP server logs OAuth configuration including the audience (client_id) at startup: logger.info(f"OAuth audience: {_get_oauth_audience()}")

🟡 [MEDIUM] CORS middleware configured with allow_origins=["*"] (all origins allowed) without additional protection on sensitive endpoints.

🟡 [MEDIUM] Email verification check is logged as warning but does NOT prevent token validation from succeeding: if not claims.get("email_verified", False): logger.warning(...); return claims

🟡 [MEDIUM] In dev mode, the test user email is logged: logger.info(f"MCP dev mode: using email {dev_email})

🟢 [LOW] MCP configuration requires MCP_OAUTH_CLIENT_ID and MCP_OAUTH_CLIENT_SECRET with no defaults, but no pydantic validation error message is provided to guide deployment.

🟢 [LOW] Tool execution logs the tool name and user email: logger.info(f"Executing tool: {tool_name}, user: {user.get('email', 'unknown')}")

OWASP Top 10 Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

🔴 [CRITICAL] Email addresses (PII) logged at warning level in token validation error handler: `logger.warning(f"Token validation failed: {e}")`

🔴 [CRITICAL] Full exception details logged at error level in token validation handler: `logger.error(f"Unexpected error during token validation: {e}")`

🔴 [CRITICAL] User email address (PII) logged at warning level when email not verified: `logger.warning(f"Google token email not verified: {claims.get('email')}`

🔴 [CRITICAL] User email address (PII) logged at info level during token validation: `logger.info(f"Google access token validated for: {normalized.get('email')}`

🔴 [CRITICAL] Full exception details logged at warning level during token validation: `logger.warning(f"Token validation failed: {type(e).name}: {e}")`

🟠 [HIGH] Dev mode bypasses all OAuth token validation: When `dev_mode=True`, the middleware skips token validation entirely and grants all scopes without verification.

🟠 [HIGH] MCP OAuth secrets use placeholder values 'REPLACE_ME' in Terraform with `lifecycle { ignore_changes = [secret_data] }`. This means if the secret is never actually populated, the placeholder could be deployed.

🟡 [MEDIUM] User email address from OAuth context is used directly in `url` metadata without validation: `url=f"mcp://user/{user_email}"`. User email is extracted from claims without additional validation.

🟡 [MEDIUM] MCP server logs OAuth configuration including the audience (client_id) at startup: `logger.info(f"OAuth audience: {_get_oauth_audience()}")`

🟡 [MEDIUM] CORS middleware configured with `allow_origins=["*"]` (all origins allowed) without additional protection on sensitive endpoints.

🟡 [MEDIUM] Email verification check is logged as warning but does NOT prevent token validation from succeeding: `if not claims.get("email_verified", False): logger.warning(...); return claims`

🟡 [MEDIUM] In dev mode, the test user email is logged: `logger.info(f"MCP dev mode: using email {dev_email}`)

🟢 [LOW] MCP configuration requires `MCP_OAUTH_CLIENT_ID` and `MCP_OAUTH_CLIENT_SECRET` with no defaults, but no pydantic validation error message is provided to guide deployment.

🟢 [LOW] Tool execution logs the tool name and user email: `logger.info(f"Executing tool: {tool_name}, user: {user.get('email', 'unknown')}")`