Skip to content

improvement(uploads): add multipart upload + batching + retries#938

Merged
Sg312 merged 8 commits intostagingfrom
improvement/file-upload-retries
Aug 13, 2025
Merged

improvement(uploads): add multipart upload + batching + retries#938
Sg312 merged 8 commits intostagingfrom
improvement/file-upload-retries

Conversation

@Sg312
Copy link
Contributor

@Sg312 Sg312 commented Aug 12, 2025

Summary

  • Multipart file uploads
  • File upload retries
  • Batched file uploads
  • Add auth to file upload routes
  • Upload error logging
  • Upload progress ui

Fixes #(issue)

Type of Change

  • Other: Upload improvements + increased logging

Testing

Manual testing

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

@vercel
Copy link

vercel bot commented Aug 12, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Project Deployment Preview Comments Updated (UTC)
sim Ready Preview Comment Aug 12, 2025 11:44pm
1 Skipped Deployment
Project Deployment Preview Comments Updated (UTC)
docs ⬜️ Skipped Aug 12, 2025 11:44pm

@vercel vercel bot temporarily deployed to Preview – docs August 12, 2025 18:00 Inactive
@Sg312 Sg312 marked this pull request as ready for review August 12, 2025 18:01
@Sg312 Sg312 changed the title File upload retries + multipart uploads improvement(uploads): add multipart upload + batching + retries Aug 12, 2025
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This PR implements a comprehensive file upload system with retry logic, multipart uploads, and enhanced UI feedback for the knowledge base feature. The changes span three key areas:

Core Upload Logic (use-knowledge-upload.ts): The hook now supports three upload strategies - direct uploads via presigned URLs, multipart uploads for large files (>50MB), and API fallback when cloud storage isn't configured. Key additions include exponential backoff retry logic with configurable timeouts, concurrent batch processing with race condition prevention, and detailed progress tracking per file. The implementation uses atomic state updates to handle complex concurrent upload scenarios while providing real-time status updates.

Multipart Upload API (multipart/route.ts): A new API endpoint enables S3 multipart uploads through four actions: initiate, get-part-urls, complete, and abort. This follows AWS S3's standard multipart upload pattern, allowing large files to be uploaded in smaller chunks that can be retried independently. The endpoint is currently restricted to S3 storage providers and hardcoded for knowledge-base uploads.

Enhanced UI (upload-modal.tsx): The upload modal now displays individual file progress bars, status icons (loading, success, failure), and detailed error messages. The interface shows granular progress information including current upload stage and file completion counts. The modal height was increased to accommodate the expanded status display.

These changes work together to solve reliability issues with large file uploads while providing much better user feedback during the upload process. The architecture supports various file sizes and storage configurations while handling edge cases like network failures and concurrent uploads.

PR Description Notes:

  • The PR description is a template with no actual content filled in
  • Missing description of changes, issue reference, testing details, and type of change selection

Confidence score: 2/5

  • This PR requires careful review due to security concerns and missing validation in critical upload functionality
  • Score lowered due to authentication bypass, input validation gaps, and complex state management without comprehensive error handling
  • Pay close attention to apps/sim/app/api/files/multipart/route.ts for security issues and apps/sim/app/workspace/[workspaceId]/knowledge/hooks/use-knowledge-upload.ts for race conditions

3 files reviewed, 5 comments

Edit Code Review Bot Settings | Greptile

@vercel vercel bot temporarily deployed to Preview – docs August 12, 2025 23:36 Inactive
@Sg312 Sg312 marked this pull request as ready for review August 12, 2025 23:37
@vercel vercel bot temporarily deployed to Preview – docs August 12, 2025 23:38 Inactive
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This review covers only the changes made since the last review (commit c444a5d), not the entire PR.

The most recent changes focus on adding authentication to the file upload system across three API endpoints. The changes add session-based authentication checks to the /api/files/upload, /api/files/presigned, and /api/files/multipart routes using the getSession() function from the auth library.

In the upload route, the authentication check simply blocks unauthenticated users from uploading files. The presigned URL endpoint goes further by replacing the client-provided userId parameter with a sessionUserId extracted from the authenticated session, eliminating the security risk of trusting user-provided identity data. The multipart upload endpoint also includes the same authentication pattern.

These changes address a critical security vulnerability where the file upload endpoints were previously accessible to unauthenticated users, which could lead to abuse, unauthorized storage costs, and potential malicious file uploads. The implementation follows established patterns used throughout the codebase's API endpoints and maintains backward compatibility while strengthening security posture.

Confidence score: 4/5

  • This authentication addition is a crucial security improvement with minimal risk of breaking existing functionality
  • Score reflects solid implementation following established patterns, but the multipart endpoint still has underlying issues noted in previous reviews
  • Pay close attention to the multipart endpoint which still lacks proper validation, rate limiting, and error handling mechanisms

3 files reviewed, no comments

Edit Code Review Bot Settings | Greptile

@waleedlatif1 waleedlatif1 deleted the improvement/file-upload-retries branch August 15, 2025 17:04
arenadeveloper02 pushed a commit to arenadeveloper02/p2-sim that referenced this pull request Sep 19, 2025
…tudioai#938)

* File upload retries + multipart uploads

* Lint

* FIle uploads

* File uploads 2

* Lint

* Fix file uploads

* Add auth to file upload routes

* Lint
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant