-
Notifications
You must be signed in to change notification settings - Fork 56
Feature/knowledge connector s3 #831
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds an S3 knowledge connector feature to the AWS extension, enabling extraction and processing of documents from S3 buckets into knowledge sources. It also updates deprecated APIs and dependencies.
Key changes:
- Implements S3 connector with support for multiple file types (txt, pdf, docx, csv, json, jsonl, md, pptx)
- Integrates LangChain for document loading and text chunking with configurable chunk sizes
- Updates deprecated Buffer constructor and context API methods
Reviewed changes
Copilot reviewed 15 out of 17 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| extensions/aws/tsconfig.json | Added compiler options for better module compatibility |
| extensions/aws/src/nodes/lambdaInvoke.ts | Fixed deprecated Buffer constructor and updated context API call |
| extensions/aws/src/module.ts | Registered S3 connector and commented out existing nodes |
| extensions/aws/src/knowledge-connectors/s3Connector.ts | Main connector implementation for processing S3 files into knowledge sources |
| extensions/aws/src/knowledge-connectors/helpers/text_extractor.ts | Document loader using LangChain for various file types |
| extensions/aws/src/knowledge-connectors/helpers/text_chunker.ts | Text splitting logic with configurable chunk sizes |
| extensions/aws/src/knowledge-connectors/helpers/new_utils.ts | S3 file download and chunk extraction utilities |
| extensions/aws/src/knowledge-connectors/helpers/list_files.ts | S3 bucket listing functionality |
| extensions/aws/src/knowledge-connectors/helpers/utils/*.ts | Utility functions for text processing and configuration |
| extensions/aws/src/knowledge-connectors/helpers/creds.env | Environment variable template for AWS credentials |
| extensions/aws/package.json | Updated dependencies to support LangChain and newer AWS SDK |
| extensions/aws/.npmrc | Added legacy peer deps flag for dependency resolution |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
extensions/aws/src/knowledge-connectors/helpers/text_extractor.ts
Outdated
Show resolved
Hide resolved
extensions/aws/src/knowledge-connectors/helpers/text_chunker.ts
Outdated
Show resolved
Hide resolved
extensions/aws/src/knowledge-connectors/helpers/text_chunker.ts
Outdated
Show resolved
Hide resolved
extensions/aws/src/knowledge-connectors/helpers/text_chunker.ts
Outdated
Show resolved
Hide resolved
extensions/aws/src/knowledge-connectors/helpers/text_chunker.ts
Outdated
Show resolved
Hide resolved
extensions/aws/src/knowledge-connectors/helpers/text_chunker.ts
Outdated
Show resolved
Hide resolved
extensions/aws/src/knowledge-connectors/helpers/utils/removeUnnecessaryChars.ts
Outdated
Show resolved
Hide resolved
|
Suggested changes:
|
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
✅ Snyk checks have passed. No issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 15 out of 17 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
extensions/aws/src/knowledge-connectors/helpers/utils/config.ts
Outdated
Show resolved
Hide resolved
extensions/aws/src/knowledge-connectors/helpers/utils/config.ts
Outdated
Show resolved
Hide resolved
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 15 out of 17 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
extensions/aws/src/knowledge-connectors/helpers/text_extractor.ts
Outdated
Show resolved
Hide resolved
…y, remove seperators
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 14 out of 16 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
extensions/aws/src/knowledge-connectors/helpers/text_chunker.ts
Outdated
Show resolved
Hide resolved
extensions/aws/src/knowledge-connectors/helpers/utils/logger.ts
Outdated
Show resolved
Hide resolved
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 14 out of 16 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
extensions/aws/src/knowledge-connectors/helpers/text_chunker.ts
Outdated
Show resolved
Hide resolved
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 14 out of 16 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 14 out of 16 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…removed converttoutf8
…xtensions into feature/knowledge-connector-s3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 11 out of 13 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
extensions/aws/src/knowledge-connectors/helpers/text_chunker.ts
Outdated
Show resolved
Hide resolved
extensions/aws/src/knowledge-connectors/helpers/text_chunker.ts
Outdated
Show resolved
Hide resolved
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 11 out of 13 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 11 out of 13 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Hi Falak. I have fixed the chunking size, it is not exactly identical to the one in the manual upload but a lot better. Thanks for your time! :)