-
Notifications
You must be signed in to change notification settings - Fork 2.9k
feat(openai): encoding_format support for embeddings #8256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The latest updates on your projects. Learn more about Vercel for Git ↗︎
1 Skipped Deployment
|
|
@hntrl please check |
hntrl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only thing I'll note is that base64 outputs aren't going to be consistent with the types of the base embeddings class which probably isn't what we want (it indicates that embeddings are represented in num arrays, but oAI allows base64 outputs). CI will probably pass just because their latest types are reflective of the encoding format and we haven't bumped the sdk version yet. So a couple of things would need some attention before we could merge this:
(1) Bump the openai sdk version to latest to properly surface param types
(2) Take another look at the base class to see if we can better manage output types (i'd be amenable to introducting a generic w/ a default on the base class that determines output type)
|
will rerun when gcp reverts 🙈 |
…45/langchainjs into openai-embeddings-encoding
|
@anadi45 Happy to merge this if we give some thought to what I described above. Out of curiosity what's the use case you have for base64 embeddings? |
|
Hi @hntrl, I've updated the OpenAI SDK to the latest version, but it appears that the type definitions have not been updated accordingly within the package. Regarding the alternative approach of updating the base embedding class to use generic types — I did explore that direction. However, I encountered a number of dependencies, particularly around the vector store classes, which made me think if they break existing features.. |
|
@hntrl I saw this as an open issue, and thought of solving it. I personally don't have a use case for this. |
|
@anadi45 gotcha, I missed the linked issue! Thanks for taking the time to contribute! I'll call out that the sdk version is up to date on main (we had some other things we needed from the latest openAI version). I imagine if the types haven't updated in the oAI sdk but it's reflected in the docs then it's just a change that got missed/ is waiting to land (there's more of those than you think). Maybe it's possible to have an infill type in the class signature that accepts the proper encoding format, then we can just nix it whenever openAI gets around to adding it? If we make it a generic with a default then I imagine that won't be a breaking change. Could you give more insight into why you think it would be breaking? - export interface EmbeddingsInterface {
+ export interface EmbeddingsInterface<TOutput = number[]> {
...
- embedDocuments(documents: string[]): Promise<number[][]>;
+ embedDocuments(documents: string[]): Promise<TOutput[]>;
...
- embedQuery(document: string): Promise<number[]>;
+ embedQuery(document: string): Promise<TOutput>;
} |
|
@hntrl have added generic type to base class |
|
apologies, this got buried! I've rebased and continued this on in #8916 |
Fixes # (#8221)