Skip to content

fix(kb-uploads): created knowledge, chunks, tags services and use redis for queueing docs in kb#1143

Merged
waleedlatif1 merged 18 commits intostagingfrom
fix/kb-uploads
Aug 27, 2025
Merged

fix(kb-uploads): created knowledge, chunks, tags services and use redis for queueing docs in kb#1143
waleedlatif1 merged 18 commits intostagingfrom
fix/kb-uploads

Conversation

@waleedlatif1
Copy link
Collaborator

@waleedlatif1 waleedlatif1 commented Aug 27, 2025

Summary

created knowledge, chunks, tags services and use redis for queueing docs in kb since doing it in-memory in serverless was not the right method

Type of Change

  • Bug fix
  • New feature

Testing

Tested manually, see below. Added/updated unit tests

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

Screenshots/Videos

Screen.Recording.2025-08-26.at.6.00.25.PM.mov
Screen.Recording.2025-08-26.at.10.11.54.PM.mov

@vercel
Copy link

vercel bot commented Aug 27, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
docs Ready Ready Preview Comment Aug 27, 2025 6:00am
sim Ready Ready Preview Comment Aug 27, 2025 6:00am

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This PR implements a comprehensive refactoring of the knowledge base system, moving from in-memory document processing to a Redis-based queueing architecture for better serverless compatibility. The changes create dedicated service layers for knowledge base operations, chunks, and tags, following proper separation of concerns principles.

Key architectural improvements:

  • Service Layer Architecture: Extracted complex database operations from API routes into dedicated service modules (@/lib/knowledge/service.ts, @/lib/knowledge/documents/service.ts, @/lib/knowledge/chunks/service.ts, @/lib/knowledge/tags/service.ts)
  • Redis Queue Implementation: Added DocumentProcessingQueue class with Redis-backed job queuing and fallback to in-memory processing for document processing workflows
  • File Type Support Expansion: Added parsers for DOC, TXT, and MD files with comprehensive UTF-8 sanitization across all parsers to prevent PostgreSQL encoding issues
  • Multipart Upload Support: Implemented batch presigned URL generation and multipart uploads for both S3 and Azure Blob storage to handle large file uploads efficiently
  • Enhanced Upload UX: Added real-time progress tracking, file-specific status indicators, and improved error handling in the knowledge base creation modal

Code organization improvements:

  • Moved document processing utilities from @/lib/documents/ to @/lib/knowledge/documents/ for better domain organization
  • Consolidated embedding utilities into @/lib/embeddings/utils
  • Added comprehensive TypeScript interfaces for all knowledge base operations
  • Implemented proper validation for file uploads with centralized file type checking

Infrastructure changes:

  • Added word-extractor dependency for DOC file parsing
  • Updated upload strategies to use batch processing (reduced from 15 to 8 concurrent files)
  • Implemented retry mechanisms with exponential backoff for external API calls
  • Added comprehensive UTF-8 text sanitization utilities

The refactoring maintains API compatibility while dramatically improving scalability, maintainability, and reliability for knowledge base operations in serverless environments.

Confidence score: 3/5

  • This PR requires careful review due to significant architectural changes and complex Redis queue implementation
  • Score reflects concerns about error handling in async streaming logic, potential memory leaks in infinite loops, and some missing test coverage for new service layers
  • Pay close attention to the Redis queue implementation, service layer error handling, and the removal of some existing test files without replacement

66 files reviewed, 25 comments

Edit Code Review Bot Settings | Greptile

@vercel vercel bot temporarily deployed to Preview – docs August 27, 2025 05:47 Inactive
emir-karabeg and others added 7 commits August 26, 2025 22:47
…1146)

* fix(condition-block): edges not following blocks, duplicate issues

* add subblock update to setActiveWorkflow

* Update apps/sim/app/workspace/[workspaceId]/w/[workflowId]/components/workflow-block/components/sub-block/components/condition-input.tsx

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
…ad code & consolidate other copilot files (#1147)

* cleanup

* support azure blob image upload

* imports cleanup

* PR comments

* ack PR comments

* fix key validation
#1136)

* added forwarding for outlook

* lint

* improved excel sheet read

* addressed greptile

* fixed bodytext getting truncated

* fixed any type

* added html func

---------

Co-authored-by: Adam Gough <adamgough@Mac.attlocal.net>
@vercel vercel bot temporarily deployed to Preview – docs August 27, 2025 05:50 Inactive
@vercel vercel bot temporarily deployed to Preview – sim August 27, 2025 05:53 Inactive
@waleedlatif1 waleedlatif1 merged commit 51b1e97 into staging Aug 27, 2025
3 of 4 checks passed
@waleedlatif1 waleedlatif1 deleted the fix/kb-uploads branch August 27, 2025 05:55
arenadeveloper02 pushed a commit to arenadeveloper02/p2-sim that referenced this pull request Sep 19, 2025
…is for queueing docs in kb (simstudioai#1143)

* improvement(kb): created knowledge, chunks, tags services and use redis for queueing docs in kb

* moved directories around

* cleanup

* bulk create docuemnt records after upload is completed

* fix(copilot): send api key to sim agent (simstudioai#1142)

* Fix api key auth

* Lint

* ack PR comments

* added sort by functionality for headers in kb table

* updated

* test fallback from redis, fix styling

* cleanup copilot, fixed tooltips

* feat: local auto layout (simstudioai#1144)

* feat: added llms.txt and robots.txt (simstudioai#1145)

* fix(condition-block): edges not following blocks, duplicate issues (simstudioai#1146)

* fix(condition-block): edges not following blocks, duplicate issues

* add subblock update to setActiveWorkflow

* Update apps/sim/app/workspace/[workspaceId]/w/[workflowId]/components/workflow-block/components/sub-block/components/condition-input.tsx

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* fix dependency array

* fix(copilot-cleanup): support azure blob upload in copilot, remove dead code & consolidate other copilot files (simstudioai#1147)

* cleanup

* support azure blob image upload

* imports cleanup

* PR comments

* ack PR comments

* fix key validation

* improvement(forwarding+excel): added forwarding and improve excel read (simstudioai#1136)

* added forwarding for outlook

* lint

* improved excel sheet read

* addressed greptile

* fixed bodytext getting truncated

* fixed any type

* added html func

---------

Co-authored-by: Adam Gough <adamgough@Mac.attlocal.net>

* revert agent const

* update docs

---------

Co-authored-by: Siddharth Ganesan <33737564+Sg312@users.noreply.github.com>
Co-authored-by: Emir Karabeg <78010029+emir-karabeg@users.noreply.github.com>
Co-authored-by: Vikhyath Mondreti <vikhyathvikku@gmail.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: Vikhyath Mondreti <vikhyath@simstudio.ai>
Co-authored-by: Adam Gough <77861281+aadamgough@users.noreply.github.com>
Co-authored-by: Adam Gough <adamgough@Mac.attlocal.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants