Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

community: add sqlite-vec vectorstore #25003

Merged
merged 9 commits into from
Sep 26, 2024

Conversation

abhiaagarwal
Copy link
Contributor

Description:

Adds a vector store integration with sqlite-vec, the successor to sqlite-vss that is a single C file with no external dependencies.

Pretty straightforward, just copy-pasted the sqlite-vss integration and made a few tweaks and added integration tests. Only question is whether all documentation should be directed away from sqlite-vss if it is defacto deprecated (cc @asg017).

@dosubot dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Aug 2, 2024
Copy link

vercel bot commented Aug 2, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
langchain ✅ Ready (Inspect) Visit Preview 💬 Add feedback Sep 26, 2024 5:37pm

@dosubot dosubot bot added community Related to langchain-community Ɑ: vector store Related to vector store module 🤖:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features labels Aug 2, 2024
metadatas: Optional list of metadatas associated with the texts.
kwargs: vectorstore specific parameters
"""
max_id = self._connection.execute(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would a RETURNING clause make more sense here, instead of max_id? Just to avoid multiple SQL statements, and would avoid subtle bugs with SQLite autoincrement values

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't even aware of RETURNING (I copied this impl directly from the sqlite-vss), I'll try that! That also means I can remove the trigger and avoid having two copies of the embedding in the database.

Copy link
Contributor

@philippe2803 philippe2803 Aug 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was completely unaware of the RETURNING clause as well. Unfortunately, this does not seem to be working with the sqlite3 executemany method as pointed here: python/cpython#100021. I tried and got the error described on this issue.

@asg017
Copy link

asg017 commented Aug 5, 2024

Hey thanks for putting this up!

I will be deprecating sqlite-vss very soon, sqlite-vec is an improvement in nearly every single way and having two similar extensions doesn't make much sense. I'll just need to put together a migration guide before archiving the repo.

Also to note: I'll be adding proper metadata columns + filtering in the near future, which should simplify a lot of code here, but that would be for a little longer

Happy to take a closer look if needed

@philippe2803
Copy link
Contributor

@abhiaagarwal I actually submitted a PR doing the actual same thing here: #25024 and did not pay attention someone else had already done it less than a day before me! I will close my PR then. Congratulation for beating me to it!

@liunux4odoo
Copy link
Contributor

Excellent work.
An advice: it could be problematic if use raw sqlite connection directly, maybe it's better to use sqlalchemy as langchain-postgres does?
Also langchain-postgres would be a good reference that supports async methods and metadata filtering,

@abhiaagarwal
Copy link
Contributor Author

@liunux4odoo I'm totally open to that — I've been trying to figure out an async client and it's been not fun. Let me try the sqlalchemy approach!

@philippe2803
Copy link
Contributor

@abhiaagarwal I feel your PR should go forward as it is for now, it's perfectly reasonable as it is. Sqlite-vec is a major vector store alternative that needs to be reflected in Langchain. I don't think we should wait.

@liunux4odoo I like the idea of integrating it with SQLAlchemy but this feels like a major disruption for the purpose of this PR. I don't think the need was ever expressed with sqlite-vss to have SQLAlchemy, so how about we just go ahead without it for now, and if the need comes along we do a refacto for it?

@asg017 I was completely unaware of the RETURNING feature (thanks for bringing that up). So I did try to implement it, but unfortunately got some error. It seems that RETURNING is not available (yet?) when using cursor.executemany method.

@philippe2803
Copy link
Contributor

@baskaryan could this PR be looked at and merged soon?

@abhiaagarwal
Copy link
Contributor Author

@philippe2803 yep, I discovered the RETURNING thing as well with executemany, I can actually solve it I believe by just doing multiple executes with RETURNING. I haven't benchmarked the performance characteristics yet, but it's probably not worth it.

@efriis
Copy link
Member

efriis commented Aug 28, 2024

Howdy! Looks good other than the doc - could you rewrite to match the specified format? There's a cli command that bootstraps it here: #24800

@philippe2803
Copy link
Contributor

@abhiaagarwal would you be OK to give me permission to the sqlite-vec branch on your langchain fork? I have made the changes (I believe) necessary for the docs build to pass. This would get this PR merged shortly.

@abhiaagarwal
Copy link
Contributor Author

@philippe2803 absolutely, I just did it, let me know if it works for you!

@philippe2803
Copy link
Contributor

@efriis looks like this is all passing now. I believe this PR can now be merged.

@philippe2803
Copy link
Contributor

@efriis sorry for being pushy, but could this PR be merged? It's been pending for quite some time now, and every GHA are currently passing. At your disposal if I can help in any ways.

@philippe2803
Copy link
Contributor

@baskaryan @efriis could we please move forward with this PR please? it has been hanging for over a week now, although everything has passed. Happy to help in any ways

@DecisionNerd
Copy link

@efriis anything the community can do to push this forward?

@efriis
Copy link
Member

efriis commented Sep 22, 2024

Hey team! Linking the review process - the status is currently "needs support" so can give it an upvote to get it through!

@dosubot dosubot bot added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Sep 26, 2024
@efriis efriis enabled auto-merge (squash) September 26, 2024 17:30
@efriis efriis merged commit 696114e into langchain-ai:master Sep 26, 2024
20 checks passed
@ajram23
Copy link

ajram23 commented Sep 26, 2024

@efriis Thank you! Appreciate the approval!
one quick question: Will this package be available for langchain 2.X or will it require to migrate to 3.X.

@efriis
Copy link
Member

efriis commented Sep 26, 2024

will require 0.3 because the current langchain-community library depends on pydantic v2 in other areas!

@efriis
Copy link
Member

efriis commented Sep 26, 2024

https://python.langchain.com/docs/versions/v0_3/

Sheepsta300 pushed a commit to Sheepsta300/langchain that referenced this pull request Oct 1, 2024
**Description**:

Adds a vector store integration with
[sqlite-vec](https://alexgarcia.xyz/sqlite-vec/), the successor to
sqlite-vss that is a single C file with no external dependencies.

Pretty straightforward, just copy-pasted the sqlite-vss integration and
made a few tweaks and added integration tests. Only question is whether
all documentation should be directed away from sqlite-vss if it is
defacto deprecated (cc @asg017).

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: philippe-oger <philippe.oger@adevinta.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community Related to langchain-community 🤖:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features lgtm PR looks good. Use to confirm that a PR is ready for merging. size:XL This PR changes 500-999 lines, ignoring generated files. Ɑ: vector store Related to vector store module
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

7 participants