Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allows specifying chunk size and overlap with /learn #267

Merged
merged 6 commits into from
Jul 18, 2023

Conversation

3coins
Copy link
Collaborator

@3coins 3coins commented Jul 14, 2023

Summary

Current implementation of /learn has fixed values for chunk size and overlap which is impractical for all document types and models. Adding options for specifying these attributes will allow power users to quickly experiment and select the options that work best for their document types and model selected. This PR adds these new options to the /learn command and stores these values in the metadata, so that relearn works when embedding models are switched.

Usage

# default chunk size of 2000, and chunk overlap of 100
/learn <directory>

# chunk size of 500, and chunk overlap of 50
/learn -c 500 -o 50 <directory>

# chunk size of 1000, and chunk overlap of 200
/learn --chunk-size 1000 --chunk-overlap 200 <directory>

@3coins 3coins added the enhancement New feature or request label Jul 14, 2023
@3coins 3coins self-assigned this Jul 14, 2023
@JasonWeill
Copy link
Collaborator

We need to document chunk size and overlap, including the default, in both the user docs and the user interface. As a user, why would I need to modify these values?

@3coins
Copy link
Collaborator Author

3coins commented Jul 14, 2023

@JasonWeill
These defaults won't work for every document and model, I was using this as a user to experiment with which size and overlap value works best and would have to update the code in order to do that without these options.

@3coins 3coins marked this pull request as ready for review July 14, 2023 18:58
@JasonWeill JasonWeill added this to the 0.10.0 Release milestone Jul 17, 2023
@JasonWeill
Copy link
Collaborator

Can you please add something about these new options to the docs for the /learn command? I can help write or edit this info. There's no help option for the chat UI's commands yet (#91), so we need to keep our written docs up to date.

@3coins 3coins merged commit 3275f30 into jupyterlab:main Jul 18, 2023
3 checks passed
dbelgrod pushed a commit to dbelgrod/jupyter-ai that referenced this pull request Jun 10, 2024
* Allows specifying chunk size and overlap with /learn

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Refactored as per PR review comments

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Documents -c and -o options

* Update docs/source/users/index.md

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jason Weill <jweill@amazon.com>
Marchlak pushed a commit to Marchlak/jupyter-ai that referenced this pull request Oct 28, 2024
* Allows specifying chunk size and overlap with /learn

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Refactored as per PR review comments

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Documents -c and -o options

* Update docs/source/users/index.md

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jason Weill <jweill@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants