Skip to content

Conversation

@StarpTech
Copy link
Contributor

@StarpTech StarpTech commented Aug 11, 2025

Summary by CodeRabbit

  • New Features
    • Option to use individual object deletes for S3-compatible storage; auto-detects Google Cloud Storage endpoints to enable compatibility mode.
  • Configuration
    • New S3_USE_INDIVIDUAL_DELETES environment flag and BuildConfig option to override delete strategy.
  • Performance/Reliability
    • Paginated deletions with batching and bounded concurrency (default cap), accurate deletion counts, abort support, and error handling for bulk deletes.
  • Tests
    • Added tests for Google Cloud Storage endpoint detection.

Checklist

  • I have discussed my proposed changes in an issue and have received approval to proceed.
  • I have followed the coding standards of the project.
  • Tests or benchmarks have been added or updated.
  • Documentation has been updated on https://github.com/wundergraph/cosmo-docs.
  • I have read the Contributors Guide.

@coderabbitai
Copy link

coderabbitai bot commented Aug 11, 2025

Walkthrough

Adds configurable S3 deletion strategies to S3BlobStorage (bulk vs per-object) with paging, bounded concurrency, abort support, and a new S3BlobStorageConfig; propagates the option via BuildConfig, env flag, index wiring, GCS endpoint auto-detection, and tests for the GCS URL detector.

Changes

Cohort / File(s) Summary of changes
S3 deletion strategy & helpers
controlplane/src/core/blobstorage/s3.ts
Exported S3BlobStorageConfig (useIndividualDeletes?); updated S3BlobStorage constructor to accept config; added bounded-concurrency helper (executeWithConcurrency), deleteObjectsBulk, deleteObjectsIndividually; removeDirectory now pages with ListObjectsV2 (ContinuationToken) and deletes per-page using configured strategy, supports abort signals, aggregates deleted count, and checks for DeleteObjects errors.
Build config wiring & GCS detection
controlplane/src/core/build-server.ts
Added useIndividualDeletes?: boolean to BuildConfig.s3Storage; imported isGoogleCloudStorageUrl; resolves useIndividualDeletes from explicit config or auto-enables for GCS endpoints; passes { useIndividualDeletes } into new S3BlobStorage(...).
Environment schema
controlplane/src/core/env.schema.ts
Added optional S3_USE_INDIVIDUAL_DELETES env variable parsed from string → boolean (val === 'true') with documentation.
Utility detector
controlplane/src/core/util.ts
Added exported isGoogleCloudStorageUrl(s: string): boolean detecting GCS by storage.googleapis.com host or gs: protocol; safely parses input and returns false on invalid values.
Index wiring
controlplane/src/index.ts
Reads S3_USE_INDIVIDUAL_DELETES from env schema and propagates it into s3Storage.useIndividualDeletes passed to build/config wiring.
Tests
controlplane/test/utils.test.ts
Exported and tested isGoogleCloudStorageUrl with multiple positive GCS URL forms and gs:// protocol, plus negative cases; existing util tests unchanged aside from import addition.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these settings in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 3a578b9 and 73bd80e.

📒 Files selected for processing (1)
  • controlplane/src/core/blobstorage/s3.ts (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • controlplane/src/core/blobstorage/s3.ts
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Analyze (go)
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: build_test
  • GitHub Check: build_push_image
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch dustin/eng-7781-gcs-does-not-support-bulk-objects-deletions

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@wilsonrivera wilsonrivera marked this pull request as ready for review August 12, 2025 12:23
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🔭 Outside diff range comments (1)
controlplane/src/core/blobstorage/s3.ts (1)

93-106: Accept 200 or 204 for DeleteObject to be provider-agnostic

Some S3-compatible providers may return 200 instead of 204 for DeleteObject. Strictly requiring 204 can cause false negatives.

Apply this diff:

-    if (resp.$metadata.httpStatusCode !== 204) {
+    if (![200, 204].includes(resp.$metadata.httpStatusCode ?? 0)) {
       throw new Error(`Failed to delete object from S3: ${resp}`);
     }
🧹 Nitpick comments (2)
controlplane/src/core/blobstorage/s3.ts (2)

12-26: Make concurrency and delay configurable via S3BlobStorageConfig

Hard-coded maxConcurrency and batchDelayMs limit tuning across providers (MinIO, Cloudflare R2, GCS). Exposing them in config enables safer rollout and performance tuning.

Proposed interface change and defaults:

-const maxConcurrency = 10; // Maximum number of concurrent operations (conservative for GCS compatibility)
-const batchDelayMs = 200; // Delay between batches in milliseconds (balance between throughput and rate limiting)
+const defaultMaxConcurrency = 10;
+const defaultBatchDelayMs = 200;
+
 export interface S3BlobStorageConfig {
   /**
    * Use individual delete operations instead of bulk delete.
    * Set to true for GCS compatibility, false for better S3 performance.
    * @default false
    */
   useIndividualDeletes?: boolean;
+  /**
+   * Max number of concurrent individual delete operations.
+   * @default 10
+   */
+  maxConcurrency?: number;
+  /**
+   * Delay between batches of individual deletes in ms.
+   * @default 200
+   */
+  batchDelayMs?: number;
 }

And store them in the class:

 export class S3BlobStorage implements BlobStorage {
   private readonly useIndividualDeletes: boolean;
+  private readonly maxConcurrency: number;
+  private readonly batchDelayMs: number;

Constructor:

-  ) {
-    this.useIndividualDeletes = config.useIndividualDeletes ?? false;
-  }
+  ) {
+    this.useIndividualDeletes = config.useIndividualDeletes ?? false;
+    this.maxConcurrency = config.maxConcurrency ?? defaultMaxConcurrency;
+    this.batchDelayMs = config.batchDelayMs ?? defaultBatchDelayMs;
+  }

Use fields:

-        await this.sleep(batchDelayMs);
+        await this.sleep(this.batchDelayMs);
-    const deletedCounts = await this.executeWithConcurrency(deleteTasks, maxConcurrency);
+    const deletedCounts = await this.executeWithConcurrency(deleteTasks, this.maxConcurrency);

138-164: Consider counting actual deletions when using bulk API

Returning objectsToDelete.length assumes all were deleted. When Quiet: false, you can use deleted.Deleted?.length ?? 0 for accuracy. Not required functionally, but improves observability and correctness.

Example change:

-    return objectsToDelete.length;
+    return (deleted.Deleted?.length ?? 0) + (deleted.Errors?.length ? 0 : 0);

Or simply:

-    return objectsToDelete.length;
+    return deleted.Deleted?.length ?? objectsToDelete.length;
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between de123ba and c172eb1.

📒 Files selected for processing (5)
  • controlplane/src/core/blobstorage/s3.ts (2 hunks)
  • controlplane/src/core/build-server.ts (3 hunks)
  • controlplane/src/core/env.schema.ts (1 hunks)
  • controlplane/src/core/util.ts (1 hunks)
  • controlplane/src/index.ts (2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
controlplane/src/core/build-server.ts (2)
controlplane/src/core/blobstorage/s3.ts (1)
  • S3BlobStorage (30-215)
controlplane/src/core/util.ts (1)
  • isGoogleCloudStorageUrl (405-418)
controlplane/src/core/blobstorage/s3.ts (1)
controlplane/src/core/blobstorage/index.ts (1)
  • BlobStorage (18-53)
🔇 Additional comments (3)
controlplane/src/index.ts (1)

49-50: Wiring env flag into BuildConfig looks correct

The new S3_USE_INDIVIDUAL_DELETES is properly parsed and passed through to BuildConfig.s3Storage.useIndividualDeletes.

Also applies to: 132-133

controlplane/src/core/build-server.ts (1)

40-40: Import of GCS detector is appropriate and localized

Pulling isGoogleCloudStorageUrl from util centralizes detection logic and keeps concerns separated.

controlplane/src/core/blobstorage/s3.ts (1)

185-214: Good: paginated listing and strategy selection per page

Listing with ContinuationToken and switching between bulk vs. individual deletes per page is correct and memory-safe. Abort signals are also properly propagated.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1564fe8 and 947e839.

📒 Files selected for processing (1)
  • controlplane/test/utils.test.ts (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: build_test
  • GitHub Check: build_push_image
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (go)
🔇 Additional comments (2)
controlplane/test/utils.test.ts (2)

2-2: Import addition looks correct

Importing isGoogleCloudStorageUrl from util.js is consistent with existing ESM import style in this test suite.


33-40: Positive GCS URL cases covered well

Good mix of host variations, including a case-insensitive host check.

@StarpTech StarpTech enabled auto-merge (squash) August 13, 2025 20:03
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🔭 Outside diff range comments (1)
controlplane/src/core/blobstorage/s3.ts (1)

56-79: AbortSignal is ignored by putObject; align signature with interface and pass it to AWS SDK

BlobStorage.putObject supports abortSignal; the S3 implementation currently drops it. This can cause unresponsive cancellations and inconsistent behavior vs. other providers.

Apply this diff to accept and propagate abortSignal:

   async putObject<Metadata extends Record<string, string>>({
     key,
     body,
     contentType,
     metadata,
   }: {
-    key: string;
-    body: Buffer;
-    contentType: string;
-    version?: string;
-    metadata?: Metadata;
+    key: string;
+    abortSignal?: AbortSignal;
+    body: Buffer;
+    contentType: string;
+    version?: string;
+    metadata?: Metadata;
   }): Promise<void> {
     const command = new PutObjectCommand({
       Bucket: this.bucketName,
       Key: key,
       Body: body,
       ContentType: contentType,
       Metadata: metadata,
     });
-    const resp = await this.s3Client.send(command);
+    const resp = await this.s3Client.send(command, { abortSignal });
     if (resp.$metadata.httpStatusCode !== 200) {
-      throw new Error(`Failed to put object to S3: ${resp}`);
+      throw new Error(`Failed to put object to S3: ${JSON.stringify(resp)}`);
     }
   }

Note: switched error interpolation to JSON.stringify for meaningful diagnostics (see similar suggestion below).

🧹 Nitpick comments (4)
controlplane/src/core/blobstorage/s3.ts (4)

12-13: Consider making deletion concurrency configurable instead of hardcoding 10

Different backends (AWS S3, GCS S3-compat, MinIO) and deployments have varying rate limits. Making this configurable avoids throttling or underutilization.

If you want a minimal change, extend the config and default here:

-const maxConcurrency = 10; // Maximum number of concurrent operations
+const DEFAULT_MAX_CONCURRENCY = 10; // Maximum number of concurrent operations (override via config)

Outside this hunk:

  • Add maxConcurrency?: number to S3BlobStorageConfig.
  • Store this.maxConcurrency = config.maxConcurrency ?? DEFAULT_MAX_CONCURRENCY; in the constructor.
  • Use this.maxConcurrency at the call site.

40-54: Docstring says “delays between batches” but no delays are implemented

The implementation batches promises with bounded concurrency, but there’s no delay. Fix the comment to avoid confusion.

Apply this diff to the docstring:

   /**
-   * Execute promises with limited concurrency and delays between batches
+   * Execute tasks with bounded concurrency in batches (no artificial delay between batches)
    * Retries are handled by AWS SDK internally using exponential backoff. Default 3 retries.
    */

156-171: Individual deletes: consider allSettled to aggregate failures instead of failing the whole batch early

With Promise.all, one failure aborts the entire batch, which may mask partial successes. Using allSettled allows reporting exactly which keys failed.

Apply this diff to aggregate errors without losing partial progress:

-    const deletedCounts = await this.executeWithConcurrency(deleteTasks, maxConcurrency);
-    return deletedCounts.reduce((sum: number, count: number) => sum + count, 0);
+    const settled = await this.executeWithConcurrency(
+      deleteTasks.map((task) => async () => {
+        try {
+          return await task();
+        } catch (e) {
+          return e as Error;
+        }
+      }),
+      maxConcurrency
+    );
+    const errors = settled.filter((r): r is Error => r instanceof Error);
+    const successes = settled.filter((r) => typeof r === 'number') as number[];
+    if (errors.length > 0) {
+      // Surface actionable details but still return number of successful deletions.
+      throw new Error(`Failed to delete ${errors.length} object(s): ${errors.map((e) => e.message).join('; ')}`);
+    }
+    return successes.reduce((sum, count) => sum + count, 0);

If keeping the fail-fast behavior is intentional, ignore this suggestion.


188-196: Nit: rephrase comment for clarity (GCS vs S3 wording)

The individual delete path is for S3-compatible backends that don't implement DeleteObjects (not “S3 implementation without DeleteObjectsCommand”).

Apply this diff:

-          // Use individual deletes for S3 implementation without DeleteObjectsCommand
+          // Use per-object deletes for backends that don't support DeleteObjects (e.g., some GCS S3-compatible endpoints)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these settings in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e213b54 and 3a578b9.

📒 Files selected for processing (1)
  • controlplane/src/core/blobstorage/s3.ts (2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
controlplane/src/core/blobstorage/s3.ts (1)
controlplane/src/core/blobstorage/index.ts (1)
  • BlobStorage (18-53)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: build_push_image
  • GitHub Check: build_test
  • GitHub Check: Analyze (go)
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (2)
controlplane/src/core/blobstorage/s3.ts (2)

14-24: Config option for per-object deletes looks good

The config shape and defaulting behavior are clear and align with the PR objective.


30-38: Constructor config injection and defaulting are correct

Storing useIndividualDeletes with a sensible default is straightforward and maintains backward compatibility.

@StarpTech StarpTech merged commit d8bac2a into main Aug 14, 2025
10 checks passed
@StarpTech StarpTech deleted the dustin/eng-7781-gcs-does-not-support-bulk-objects-deletions branch August 14, 2025 13:01
@Noroth Noroth mentioned this pull request Sep 30, 2025
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants