-
Notifications
You must be signed in to change notification settings - Fork 25
only APPLICATION_BUCKET as a bucket #199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
- removed other bucket env variables and references
📝 WalkthroughWalkthroughConsolidates per-cloud and per-feature bucket settings into a single Changes
Sequence Diagram(s)(omitted) Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
📜 Recent review detailsConfiguration used: defaults Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🔇 Additional comments (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
wavefront/server/modules/knowledge_base_module/knowledge_base_module/controllers/knowledge_base_document_controller.py (1)
304-312: Critical: Inconsistent bucket reference for signed URLs.While the upload operation at Line 126 correctly uses
config['storage']['application_bucket'], this section still uses provider-specific bucket logic (config['gcp']['gcp_asset_storage_bucket']orconfig['aws']['aws_asset_storage_bucket']) for generating signed URLs. This inconsistency will cause the signed URL to point to a different bucket than where the file was uploaded, resulting in broken URLs and 404 errors.🔎 Proposed fix to use unified bucket configuration
if signed_url: - provider = config['cloud_config']['cloud_provider'] - bucket = ( - config['gcp']['gcp_asset_storage_bucket'] - if provider.lower() == 'gcp' - else config['aws']['aws_asset_storage_bucket'] - ) + bucket = config['storage']['application_bucket'] presigned_url = cloude_storage_manager.generate_presigned_url( bucket, existing_document.file_path, 'GET' )
🧹 Nitpick comments (2)
docker-compose.sample.yml (1)
172-190: Clarify cloud provider configuration to prevent confusion.Both AWS and GCP sections define
APPLICATION_BUCKET(lines 178 and 190) within the same environment block. While users should only configure one cloud provider, having duplicate environment variable definitions may cause confusion about which value is active (the last one will override).Consider one of these approaches:
- Add clear comments indicating users should remove or comment out the unused provider's section
- Document that
CLOUD_PROVIDERdetermines which bucket is used, but the environment variable duplication is harmless💡 Suggested clarification with comments
# AWS Configuration (if CLOUD_PROVIDER=aws) + # NOTE: Configure only the section matching your CLOUD_PROVIDER setting - AWS_ACCESS_KEY_ID=<YOUR_AWS_ACCESS_KEY> - AWS_SECRET_ACCESS_KEY=<YOUR_AWS_SECRET_KEY> - AWS_REGION=<YOUR_AWS_REGION> - AWS_KMS_ARN=<YOUR_AWS_KMS_ARN> - AWS_QUEUE_URL=<YOUR_AWS_QUEUE_URL> - APPLICATION_BUCKET=<YOUR_APPLICATION_BUCKET> # GCP Configuration (if CLOUD_PROVIDER=gcp) + # NOTE: If using GCP, the APPLICATION_BUCKET below will override the AWS one above - GCP_PROJECT_ID=<YOUR_GCP_PROJECT_ID> - GCP_LOCATION=<YOUR_GCP_LOCATION> - GOOGLE_APPLICATION_CREDENTIALS=/app/credentials/gcp-service-account.json - GCP_KMS_KEY_RING=<YOUR_KEY_RING> - GCP_KMS_CRYPTO_KEY=<YOUR_CRYPTO_KEY> - GCP_KMS_CRYPTO_KEY_VERSION=<YOUR_GCP_KMS_CRYPTO_KEY_VERSION> - GCP_GOLD_TOPIC_ID=<YOUR_GOLD_TOPIC> - GCP_EMAIL_TOPIC_ID=<YOUR_EMAIL_TOPIC> - WORKFLOW_WORKER_TOPIC=<YOUR_WORKFLOW_TOPIC> - APPLICATION_BUCKET=<YOUR_APPLICATION_BUCKET>wavefront/server/modules/insights_module/insights_module/insights_container.py (1)
62-68: Consider data organization strategy within the unified bucket.Both
transcript_bucket_nameandaudio_bucket_namenow point toconfig.storage.application_bucket. While consolidating to a single bucket simplifies configuration, consider implementing a clear folder/prefix structure within the bucket to maintain logical separation (e.g.,application-bucket/transcripts/andapplication-bucket/audio/).Benefits of maintaining logical separation:
- Granular IAM policies using bucket prefixes
- Different S3 lifecycle rules per data type
- Easier cost tracking and analysis
- Simplified data discovery and management
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
DOCKER_SETUP.mddocker-compose.sample.ymlwavefront/server/apps/floware/floware/config.iniwavefront/server/apps/inference_app/inference_app/config.iniwavefront/server/background_jobs/workflow_job/workflow_job/config.iniwavefront/server/modules/agents_module/agents_module/agents_container.pywavefront/server/modules/gold_module/gold_module/gold_container.pywavefront/server/modules/inference_module/inference_module/controllers/inference_controller.pywavefront/server/modules/insights_module/insights_module/insights_container.pywavefront/server/modules/knowledge_base_module/knowledge_base_module/controllers/knowledge_base_document_controller.pywavefront/server/modules/plugins_module/plugins_module/plugins_container.pywavefront/server/modules/voice_agents_module/voice_agents_module/voice_agents_container.py
🧰 Additional context used
🧬 Code graph analysis (1)
wavefront/server/modules/knowledge_base_module/knowledge_base_module/controllers/knowledge_base_document_controller.py (1)
wavefront/client/server.cjs (1)
config(24-28)
🔇 Additional comments (13)
wavefront/server/apps/inference_app/inference_app/config.ini (1)
1-2: LGTM! Configuration consolidation looks clean.The new unified
[storage]section withapplication_bucketsimplifies the configuration by removing provider-specific bucket references. This aligns well with the PR's objective to consolidate bucket environment variables.wavefront/server/background_jobs/workflow_job/workflow_job/config.ini (1)
10-11: LGTM! Consistent configuration consolidation.The new
[storage]section effectively replaces multiple provider-specific bucket configurations, maintaining consistency with the project-wide bucket consolidation.wavefront/server/modules/voice_agents_module/voice_agents_module/voice_agents_container.py (1)
89-89: LGTM! Correctly updated to use unified bucket configuration.The change from
config.voice_agents.voice_agent_buckettoconfig.storage.application_bucketaligns with the project-wide bucket consolidation.wavefront/server/modules/inference_module/inference_module/controllers/inference_controller.py (1)
39-39: LGTM! Simplified bucket selection logic.The change from provider-based conditional logic to a unified
config['storage']['application_bucket']simplifies the code while maintaining functionality. This assumesAPPLICATION_BUCKETis properly configured for the active cloud provider.wavefront/server/modules/gold_module/gold_module/gold_container.py (1)
19-19: LGTM! Image services now use unified bucket configuration.Both
AWSImageServiceandGCPImageServicenow referenceconfig.storage.application_bucket, ensuring consistent storage configuration across cloud providers.Also applies to: 26-26
wavefront/server/modules/knowledge_base_module/knowledge_base_module/controllers/knowledge_base_document_controller.py (1)
126-126: LGTM! Upload operation uses unified bucket.The document upload correctly uses
config['storage']['application_bucket']for storing files.wavefront/server/modules/plugins_module/plugins_module/plugins_container.py (1)
50-50: LGTM! Plugin services updated to use unified bucket.Both
DynamicQueryServiceandMessageProcessorServicenow correctly referenceconfig.storage.application_bucket, maintaining consistency with the project-wide bucket consolidation.Also applies to: 57-57
docker-compose.sample.yml (1)
407-417: Configuration follows consistent pattern with floware service.The inference_app service uses the same dual-definition pattern for
APPLICATION_BUCKETas the floware service. While this creates the same potential for confusion noted in the floware section review, the consistency is good.wavefront/server/modules/insights_module/insights_module/insights_container.py (1)
69-75: GCP configuration consistent with AWS approach.The GCP services follow the same unified bucket pattern as AWS. The operational advice regarding data organization within the bucket (mentioned in the AWS section review) applies equally here.
wavefront/server/modules/agents_module/agents_module/agents_container.py (1)
37-72: Unified bucket configuration applied consistently.All three services (AgentCrudService, WorkflowCrudService, WorkflowInferenceService) now correctly reference
config.storage.application_bucket. The implementation is clean and aligns with the PR's consolidation objective.As with the insights module, consider organizing agent YAMLs, workflow definitions, and inference data into distinct prefixes within the shared bucket for better data management.
DOCKER_SETUP.md (3)
156-164: Documentation clearly describes unified bucket usage.The updated description for
APPLICATION_BUCKETaccurately reflects that it now stores all application data types. The enumeration (agents, transcripts, audio, models, etc.) helps users understand the bucket's scope.
166-177: GCP documentation consistent with AWS.The GCP
APPLICATION_BUCKETdescription matches the AWS description, providing consistency across cloud providers. Users will clearly understand the bucket serves the same unified purpose on both platforms.
377-414: Bucket creation instructions accurately updated.The bucket creation sections for both AWS (lines 377-380) and GCP (lines 411-414) have been correctly simplified to create a single
APPLICATION_BUCKET. Section headers changed from plural "Buckets" to singular "Bucket" appropriately reflects the new architecture.
| [storage] | ||
| application_bucket=${APPLICATION_BUCKET} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify config.ini changes and bucket reference updates
echo "=== Checking config.ini lines 104-105 ==="
cat -n wavefront/server/apps/floware/floware/config.ini | sed -n '100,110p'
echo ""
echo "=== Searching for removed bucket configuration references ==="
echo ""
# Search for old bucket references
echo "agent_yaml_bucket:"
rg -n "agent_yaml_bucket" --type py 2>/dev/null | head -20
echo ""
echo "voice_agent_bucket:"
rg -n "voice_agent_bucket" --type py 2>/dev/null | head -20
echo ""
echo "reference_images_bucket:"
rg -n "reference_images_bucket" --type py 2>/dev/null | head -20
echo ""
echo "aws_asset_storage_bucket:"
rg -n "aws_asset_storage_bucket" --type py 2>/dev/null | head -20
echo ""
echo "gcp_asset_storage_bucket:"
rg -n "gcp_asset_storage_bucket" --type py 2>/dev/null | head -20
echo ""
echo "model_storage_bucket:"
rg -n "model_storage_bucket" --type py 2>/dev/null | head -20
echo ""
echo "transcript_bucket_name:"
rg -n "transcript_bucket_name" --type py 2>/dev/null | head -20
echo ""
echo "audio_bucket_name:"
rg -n "audio_bucket_name" --type py 2>/dev/null | head -20
echo ""
echo "=== Checking application_bucket usage ==="
rg -n "application_bucket" --type py 2>/dev/null | head -30Repository: rootflo/wavefront
Length of output: 9624
Update bucket configuration references in multiple modules to use new [storage] section.
The config.ini changes are correct, but verification reveals that several code locations still reference the old bucket configuration paths that have been removed:
aws_asset_storage_bucket: knowledge_base_document_controller.py:308, outlook_service.py:258gcp_asset_storage_bucket: knowledge_base_document_controller.py:306, outlook_service.py:256, floware/config_service.py:25/47/74model_storage_bucket: inference_app/inference_controller.py:67-70
These references must be updated to use config['storage']['application_bucket'] (or config.storage.application_bucket). Some modules like insights, voice_agents, and plugins have been properly updated, but knowledge_base, auth, and inference_app modules have not.
Additionally, test configuration files contain old bucket references that should be cleaned up for consistency.
🤖 Prompt for AI Agents
In wavefront/server/apps/floware/floware/config.ini around lines 104-105 the
bucket was moved into the [storage] section
(application_bucket=${APPLICATION_BUCKET}); update all code locations still
reading the old config keys to reference the new path: replace uses of
aws_asset_storage_bucket, gcp_asset_storage_bucket, and model_storage_bucket
with config['storage']['application_bucket'] (or
config.storage.application_bucket) — specifically change
knowledge_base_document_controller.py lines ~306-308, outlook_service.py lines
~256-258, floware/config_service.py lines ~25/47/74, and
inference_app/inference_controller.py lines ~67-70 to read from
config['storage']['application_bucket']; also remove or update old bucket keys
in test config files so tests use the new [storage].application_bucket key for
consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
wavefront/server/apps/floware/tests/conftest.py (1)
179-184: LGTM! Configuration consolidation correctly applied.The storage configuration has been properly updated to use the unified
application_bucket. Note that this test uses'test-bucket'while other test files use'test_bucket'(underscore). While both are valid for testing, consider standardizing for consistency.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
wavefront/server/apps/floware/floware/services/config_service.pywavefront/server/apps/floware/tests/conftest.pywavefront/server/apps/inference_app/inference_app/controllers/inference_controller.pywavefront/server/modules/auth_module/auth_module/services/outlook_service.pywavefront/server/modules/inference_module/tests/conftest.pywavefront/server/modules/knowledge_base_module/knowledge_base_module/controllers/knowledge_base_document_controller.pywavefront/server/modules/knowledge_base_module/tests/conftest.pywavefront/server/modules/user_management_module/tests/conftest.py
🚧 Files skipped from review as they are similar to previous changes (1)
- wavefront/server/modules/knowledge_base_module/knowledge_base_module/controllers/knowledge_base_document_controller.py
🧰 Additional context used
🧬 Code graph analysis (1)
wavefront/server/modules/auth_module/auth_module/services/outlook_service.py (1)
wavefront/client/server.cjs (1)
config(24-28)
🔇 Additional comments (7)
wavefront/server/apps/floware/floware/services/config_service.py (2)
54-54: LGTM!The change correctly uses
application_bucketfrom the credentials dict, which is populated via the merge in_get_gcp_credentials.
81-81: LGTM!The change correctly uses
application_bucketfrom the credentials dict, which is populated via the merge in_get_gcp_credentials.wavefront/server/modules/user_management_module/tests/conftest.py (1)
167-169: LGTM! Configuration consolidation correctly applied.The addition of
storage.application_bucketto the test configuration aligns with the PR objective of consolidating bucket references into a single unified source.wavefront/server/modules/inference_module/tests/conftest.py (1)
148-148: LGTM! Test configuration properly updated.The test configuration correctly uses the unified
storage.application_bucketstructure, aligning with the PR's consolidation objective.wavefront/server/modules/knowledge_base_module/tests/conftest.py (1)
144-144: LGTM! Test configurations consistently updated.Both test configuration dictionaries have been properly updated to use the unified
storage.application_bucketstructure. The consistent use of'test_bucket'across both configurations is appropriate.Also applies to: 218-220
wavefront/server/modules/auth_module/auth_module/services/outlook_service.py (1)
255-255: Good consolidation - configuration is complete.The direct lookup of
config['storage']['application_bucket']removes provider-specific branching logic, aligning with the PR objective. The[storage]section withapplication_bucketis defined in the mainfloware/config.iniand the pattern is consistently used across multiple modules (knowledge_base, voice_agents, plugins, insights, inference, gold, agents). No old provider-specific bucket configurations remain in the codebase.wavefront/server/apps/inference_app/inference_app/controllers/inference_controller.py (1)
66-66: Good simplification - configuration structure verified.The direct lookup of
config['storage']['application_bucket']correctly simplifies the code by removing cloud provider branching. The configuration file has been properly updated with thestorage.application_bucketkey, and no remnants of old provider-specific bucket configurations remain in the codebase. Exception handling (lines 65-150) catches any potential KeyError from missing config keys.
| storage_config = self.config.get('storage') | ||
| if not isinstance(config_credentials, dict): | ||
| raise HTTPException(status_code=500, detail='GCP configuration is missing') | ||
| if not config_credentials.get( | ||
| 'gcp_asset_storage_bucket' | ||
| ) or not config_credentials.get('config_file_name'): | ||
| if not isinstance(storage_config, dict): | ||
| raise HTTPException( | ||
| status_code=500, detail='Storage configuration is missing' | ||
| ) | ||
| if not storage_config.get('application_bucket') or not config_credentials.get( | ||
| 'config_file_name' | ||
| ): | ||
| raise HTTPException(status_code=500, detail='Incomplete GCP configuration') | ||
| # Merge storage config into credentials dict for backward compatibility | ||
| config_credentials['application_bucket'] = storage_config['application_bucket'] | ||
| return config_credentials |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, let's look at the full file to understand the context
fd -t f "config_service.py" -x wc -l {}Repository: rootflo/wavefront
Length of output: 570
🏁 Script executed:
fd -t f "config_service.py" -x cat -n {}Repository: rootflo/wavefront
Length of output: 42738
🏁 Script executed:
# Search for old bucket variable references across the codebase
rg "gcp_asset_storage_bucket|GCP_ASSET_STORAGE_BUCKET" -n -C2Repository: rootflo/wavefront
Length of output: 43
🏁 Script executed:
# Check for other bucket-related configuration references
rg "application_bucket|APPLICATION_BUCKET" -n --type=py -C2Repository: rootflo/wavefront
Length of output: 15331
🏁 Script executed:
# Look for any config usage patterns that might indicate breaking changes
rg "\.get\('.*bucket" -n --type=py -C1Repository: rootflo/wavefront
Length of output: 396
Update error message and clarify internal implementation detail.
The refactoring correctly validates the new storage configuration and retrieves application_bucket from it. Two clarifications are needed:
-
Error message (line 32) needs to reflect both validations: The error states "Incomplete GCP configuration" but the method now validates both
gcp.config_file_nameandstorage.application_bucket. Update to:"Incomplete GCP or storage configuration". -
Misleading comment on line 33: The phrase "for backward compatibility" is incorrect. This merge is an internal implementation detail that allows the method to return a flat dictionary containing both credentials and bucket. The codebase migration to
application_bucketis complete—no old references remain. Either clarify the comment to describe what the merge actually does (e.g., "to consolidate GCP credentials and storage bucket for callers"), or remove it.
🤖 Prompt for AI Agents
In wavefront/server/apps/floware/floware/services/config_service.py around lines
22 to 35, update the HTTPException detail message that currently reads
"Incomplete GCP configuration" to "Incomplete GCP or storage configuration" to
reflect that both gcp.config_file_name and storage.application_bucket are
validated, and replace the misleading comment "for backward compatibility" with
a concise description of the merge purpose such as "consolidate GCP credentials
and storage bucket into a single dict for callers" (or remove the comment
entirely).
Summary by CodeRabbit
Documentation
Chores
Refactor
Tests
✏️ Tip: You can customize this high-level summary in your review settings.