Skip to content

Conversation

@sallyom
Copy link
Collaborator

@sallyom sallyom commented Dec 15, 2025

The helm values file to configure log retention isn't working. This PR adds a post-install script to configure langfuse logging to avoid langfuse-clickhouse PVC from filling with logs we don't need or want to retain.

@github-actions

This comment has been minimized.

@sallyom sallyom force-pushed the update-langfuse-logging branch from 911e7d9 to 810a2a7 Compare December 15, 2025 20:24
@github-actions

This comment was marked as outdated.

@ambient-code ambient-code deleted a comment from github-actions bot Dec 15, 2025
@ambient-code ambient-code deleted a comment from github-actions bot Dec 15, 2025
@ambient-code ambient-code deleted a comment from github-actions bot Dec 15, 2025
@sallyom
Copy link
Collaborator Author

sallyom commented Dec 15, 2025

Tested this, now in runner logs:

2025-12-15 20:29:44,091 - root - INFO - Langfuse: Privacy masking ENABLED - user messages and responses will be redacted
2025-12-15 20:29:44,216 - root - INFO - Langfuse: Model 'claude-sonnet-4-5@20250929' added to session metadata and tags
2025-12-15 20:29:44,216 - root - INFO - Langfuse: Session tracking enabled (session_id=agentic-session-1765826551, user_id=somalley, model=claude-sonnet-4-5@20250929)

and, in Langfuse UI
Screenshot 2025-12-15 at 3 34 21 PM

sallyom and others added 2 commits December 15, 2025 20:25
Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: sallyom <somalley@redhat.com>
Implements privacy-preserving Langfuse tracing that redacts user messages
and assistant responses while preserving usage metrics for cost tracking.

Privacy Controls:
- Default: MASK messages (privacy-first)
- Set LANGFUSE_MASK_MESSAGES=false to disable (dev/testing only)

What Gets Logged (with masking enabled):
✅ Token counts (input, output, cache read/creation)
✅ Cost calculations (USD per session)
✅ Model names, turn counts, session metadata
✅ Tool names and execution status
❌ User prompts → [REDACTED FOR PRIVACY]
❌ Assistant responses → [REDACTED FOR PRIVACY]
❌ Long tool outputs → [REDACTED FOR PRIVACY]

This addresses the requirement to track usage metrics without logging
potentially sensitive user data.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: sallyom <somalley@redhat.com>
@sallyom sallyom force-pushed the update-langfuse-logging branch from 810a2a7 to 8eb65e1 Compare December 16, 2025 01:34
@github-actions

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

@codecov
Copy link

codecov bot commented Dec 16, 2025

Codecov Report

❌ Patch coverage is 92.30769% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...onents/runners/claude-code-runner/observability.py 92.30% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

@github-actions

This comment was marked as outdated.

@sallyom sallyom force-pushed the update-langfuse-logging branch from 91c0a41 to 6dc6fb6 Compare December 16, 2025 02:02
@github-actions

This comment was marked as outdated.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: sallyom <somalley@redhat.com>
@sallyom sallyom force-pushed the update-langfuse-logging branch from 6dc6fb6 to 411cf2b Compare December 16, 2025 02:23
@github-actions
Copy link
Contributor

github-actions bot commented Dec 16, 2025

Claude Code Review

Summary

This PR adds privacy-first message masking to Langfuse observability and optimizes ClickHouse deployment to prevent disk space issues. The implementation is well-designed and production-ready with comprehensive test coverage and excellent documentation.

Overall Assessment: ✅ APPROVE with minor suggestions

The privacy masking implementation follows security best practices with secure-by-default behavior, and the ClickHouse optimization addresses a real operational issue. The code quality is excellent with thorough testing and documentation.


Issues by Severity

🟢 No Blocker or Critical Issues Found

This PR is ready to merge. All patterns follow established project standards.

🟡 Major Issues

None - code quality is excellent.

🔵 Minor Issues / Suggestions

1. Shell Script Robustness (Minor)

File: e2e/scripts/configure-clickhouse-ttl.sh

Issue: Line 112 uses grep -v to suppress errors, which could hide real failures

Suggestion: Consider more explicit error handling to distinguish between expected "table does not exist" errors and real failures

Impact: Low - current approach works but could be more robust


2. Documentation Clarity (Very Minor)

File: CLAUDE.md lines 360-369

Observation: The privacy controls section states "Masking is ENABLED BY DEFAULT" but then shows how to set LANGFUSE_MASK_MESSAGES=true

Suggestion: Consider rewording to make it clearer that setting the variable explicitly is optional

Impact: Very Low - documentation is already clear


Positive Highlights

🌟 Excellent Privacy-First Design

  • Secure by default: Masking enabled without requiring configuration
  • Clear opt-out: Explicit LANGFUSE_MASK_MESSAGES=false required to disable
  • Comprehensive masking: Handles strings, dicts, lists, and nested structures
  • Preserves observability: All usage metrics, costs, and metadata retained

🧪 Outstanding Test Coverage

  • 268 lines of test code for privacy masking alone
  • 9 comprehensive test cases covering all edge cases
  • Standalone executable: Tests can run independently
  • Clear assertions: Every test validates both masking AND metric preservation

📚 Production-Ready Documentation

  • User-facing docs: Clear CLAUDE.md section with examples
  • Operator guide: Comprehensive README-langfuse.md with troubleshooting
  • Inline comments: Excellent code documentation in observability.py
  • Security warnings: Clear warnings about disabling masking in production

🛡️ Follows Security Standards

  • No token logging: Consistent with project security patterns
  • Input validation: Proper environment variable handling
  • Secure defaults: Privacy-first approach aligns with security-standards.md
  • Clear security boundaries: Distinction between dev/test vs. production

🔧 Operational Excellence

  • ClickHouse optimization: Addresses real disk space issues
  • TTL automation: 7-day retention prevents runaway growth
  • Deployment scripts: Well-structured with platform auto-detection
  • Error handling: Comprehensive cleanup and error logging

✅ Code Quality

  • Type safety: Proper typing in masking function
  • Recursive handling: Correctly processes nested structures
  • Guard conditions: Prevents duplicate trace creation
  • Resource cleanup: Proper context manager usage and error handling

Recommendations

Priority 1: Merge as-is ✅

The PR is production-ready and addresses important privacy and operational concerns. All patterns follow established project standards.

Priority 2: Consider Minor Enhancements (Optional)

  1. Add boundary test case for 50-character threshold
  2. Enhance error handling in TTL script to better distinguish error types
  3. Add explanatory comment for S3 credential patching workaround

Priority 3: Post-Merge Monitoring

  1. Monitor ClickHouse disk usage after TTL configuration
  2. Verify Langfuse traces show [REDACTED FOR PRIVACY] in production
  3. Track any S3-related errors after credential patching

Architecture Alignment

Follows Backend/Operator Standards: Proper error handling, secure-by-default, no sensitive data in logs

Follows Python Standards: Black formatting, clear function signatures, comprehensive docstrings

Follows Security Standards: Privacy-first design, token/credential masking, secure defaults

Follows Testing Standards: Added test to runner-tests.yml workflow, comprehensive coverage


Final Verdict

Status: ✅ APPROVED

This is exemplary work that demonstrates:

  • Deep understanding of privacy requirements
  • Production-ready operational thinking
  • Excellent testing practices
  • Clear, comprehensive documentation

The minor suggestions above are enhancements, not blockers. The PR is ready to merge and will provide immediate value by protecting user privacy and preventing operational issues.

Great work, @sallyom! 🎉


🔍 View AI decision process (logs available for 90 days)

📋 View memory system files loaded (click to expand)

What Amber Loaded for Code Review

Amber automatically loaded these repository standards from the memory system:

  1. CLAUDE.md - Master project instructions, development standards
  2. backend-development.md - Go backend, K8s integration patterns
  3. frontend-development.md - NextJS, Shadcn UI, React Query patterns
  4. security-standards.md - Auth, RBAC, token handling
  5. k8s-client-usage.md - User token vs service account patterns
  6. error-handling.md - Consistent error patterns
  7. react-query-usage.md - Data fetching patterns

Impact: This review used your repository's specific code quality standards, security patterns, and best practices from the memory system (PRs #359, #360) - not just generic code review guidelines.

@bobbravo2 bobbravo2 added this to the v0.0.14 milestone Dec 16, 2025
@Gkrumbach07 Gkrumbach07 merged commit 149dcf6 into ambient-code:main Dec 17, 2025
16 checks passed
adalton pushed a commit to adalton/ambient-platform that referenced this pull request Dec 17, 2025
The helm values file to configure log retention isn't working. This PR
adds a post-install script to configure langfuse logging to avoid
langfuse-clickhouse PVC from filling with logs we don't need or want to
retain.

---------

Signed-off-by: sallyom <somalley@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants