improvement(kb): encode non-ASCII headers for kb uploads#1595
Merged
waleedlatif1 merged 3 commits intostagingfrom Oct 10, 2025
Merged
improvement(kb): encode non-ASCII headers for kb uploads#1595waleedlatif1 merged 3 commits intostagingfrom
waleedlatif1 merged 3 commits intostagingfrom
Conversation
Contributor
There was a problem hiding this comment.
Greptile Overview
Summary
This PR fixes knowledge base upload issues with non-ASCII filenames by implementing RFC 5987 encoding in the Content-Disposition header. The core fix adds encodeFilenameForHeader() which properly encodes international characters using UTF-8 encoding with a fallback to ASCII-safe names.
Key changes:
- Added RFC 5987 encoding for Content-Disposition headers to support non-ASCII filenames (e.g., Chinese, Japanese, Arabic characters)
- Updated file serve endpoint to use hybrid authentication (session/API key/internal JWT) instead of session-only
- Added internal JWT authentication for document processor to serve API communication
- Increased KB processing timeouts from 300s to 600s and embedding batch size from 50 to 500 for better performance
- Fixed OCR file handling to exclude internal serve paths from external HTTPS detection
Confidence Score: 4/5
- This PR is generally safe to merge with proper testing of the authentication changes
- The core filename encoding fix is solid and follows RFC 5987 standards. However, the authentication changes from session-only to hybrid auth in the serve endpoint represent a significant security model change that needs verification. The internal JWT implementation appears sound, but the broader implications of allowing internal JWT access to all files (not just KB files) should be validated. The timeout and batch size increases are reasonable performance improvements.
- Pay close attention to
apps/sim/app/api/files/serve/[...path]/route.ts- verify that hybrid auth correctly restricts access and that internal JWT authentication is properly scoped
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| apps/sim/app/api/files/utils.ts | 5/5 | Added encodeFilenameForHeader function to properly encode non-ASCII characters in Content-Disposition headers using RFC 5987 encoding, fixing KB upload issues with international characters |
| apps/sim/app/api/files/serve/[...path]/route.ts | 4/5 | Switched from session-only auth to hybrid auth (session/API key/internal JWT), removed execution file logging - changes improve auth flexibility but may have broader implications |
| apps/sim/lib/knowledge/documents/document-processor.ts | 5/5 | Added internal token authentication for internal file serve requests, fixed OCR detection to exclude internal serve paths from external HTTPS check |
Sequence Diagram
sequenceDiagram
participant C as Client
participant U as Upload Route
participant S as Storage
participant F as Serve Route
participant P as Doc Processor
participant A as Auth
C->>U: Upload file
U->>S: Store file
S-->>U: File path
U-->>C: Success
P->>P: Start processing
P->>F: Fetch file
F->>A: Verify
A-->>F: OK
F->>S: Get file
S-->>F: Buffer
F->>F: Encode header
F-->>P: File data
P->>P: Process
7 files reviewed, 2 comments
f3bbadb to
2957b19
Compare
Collaborator
Author
|
1 Job Failed: CI / Test and Build / Test and Build failed on "Run tests with coverage" |
waleedlatif1
added a commit
that referenced
this pull request
Oct 11, 2025
* improvement(kb): encode non-ASCII headers for kb uploads * cleanup * increase timeouts to match trigger
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
encode non-ASCII headers for kb uploads
Type of Change
Testing
Tested manually.
Checklist