-
Notifications
You must be signed in to change notification settings - Fork 2k
Manually specify openai-compat format and parse it #4463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @dflatline, I reviewed and tested your PR, it indeed solves the issue but I am not sure why. I tried passing the dimensions directly to the request and that also didn't solve the issue.
While there's nothing wrong with your PR do you think you can try and find out why exactly is the current implementation failing?
Let me know what you think!
console.log(`[OpenAI-Compatible Embedder] After mapping - embedding length: ${embeddings[0]?.length}`) | ||
if (embeddings[0]) { | ||
console.log( | ||
`[OpenAI-Compatible Embedder] First 10 values after mapping:`, | ||
embeddings[0].slice(0, 5), | ||
) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can probably remove these.
@@ -111,10 +111,37 @@ export class OpenAICompatibleEmbedder implements IEmbedder { | |||
const response = await this.embeddingsClient.embeddings.create({ | |||
input: batchTexts, | |||
model: model, | |||
encoding_format: "base64", // Use base64 to protect embedding dimensions from openai sabotage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure this is actually sabotage or not but, we probably want to remove the comment.
I too tried quite a number of different ways of providing the dimension to the API, to no effect. Everything was happening fine on the wire with and without a dimension param. I monitored the local interface via wireshark. The bug definitely seems to be happening in the OpenAI library, which has a number of abstractions around parsing and marshaling that I could not easily get to the bottom of... That's why I just declared it sabotage, put on the Beastie Boys, and worked around it. 😅 It does feel like a bug should be filed with OpenAI. It seems like the best thing to do there would be either a unit test or minimal standalone node app that triggers the behavior. I am not sure the best way to extract a minimal test case, as I am not a typescript or node developer. Do you have any suggestions for minimal way to exercise this for purpose of reporting to OpenAI? |
Hrmm.. It looks like the OpenAI parsing code has had bugs before. It looks like Roo is pinned to 4.78.1 of OpenAI package, but there was a fix for this embedding API in 4.92: openai/openai-node#1448 I don't yet know if that is the same bug though. Frankly I am new to npm and typescript... I got to this side-quest from trying to improve codebase search for a python project of mine. So I am still open to suggestions for the best way to make a test case for this bug to determine if it is indeed fixed in later OpenAI package versions. |
Ok I built with OpenAI v4.103 and confirmed it had the fix from openai/openai-node#1448, but it does not fix the issue for ollama and LM Studio openai endpoints. Interestingly, I can no longer get llama.cpp endpoints to exhibit the issue with any version, but I am getting other strange behavior. I have a test script of the OpenAI package in at #4462 (comment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @dflatline, Thank you for looking into this, I think we can use this workaround for the time being.
I left a suggestion regarding the testing, it might be a good idea to test the base64 path in case something changes in the future.
Let me know what you think!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To ensure the new base64 decoding logic is fully validated, could we update the mock response in this test file? If the mock returned a base64 string instead of a float array, we could confirm the decoding path is exercised correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I added three tests for this.
I also added a test to verify that the OpenAI package does not parse the embeddings when you request a base64 format, and that they are then as expected. That way, we know if they (or some rando) "fixes" this behavior so the openai package parses whatever format you request for you.
All of these tests are generated with RooCode and claude4.. That OpenAI one was a real pain to get right. It kept wanting to mock it at the RooCode api, and not the post inside of the OpenAI package. In fact, Claude4 would frequently revert this internal mocking of OpenAI to mock it in at the Roo package level while fixing other unrelated issues with the tests... So while handy for anti-fragility of this workaround, the OpenAI test might be "vibe-fragile" itself, in that some future vibe coded tests might "fix" this OpenAI mocking again to move mocking back to the wrong higher level.. We live in interesting times!
Improve comment
3917811
to
17401b2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @dflatline, Thank you for your contribution!
I made some minor improvements to the typing.
I tested it and works perfectly!
Is the test failure here legit? |
@dflatline do you have discord? |
* Manually specify openai-compat format and parse it * fixup! Manually specify openai-compat format and parse it * Expect base64 in embedding test arguments * fixup! Manually specify openai-compat format and parse it Remove debug logs * fixup! Manually specify openai-compat format and parse it Improve comment * fixup! Manually specify openai-compat format and parse it * Add tests to exercise base64 decode of embeddings * Add tests to verify openai base64 and brokenness behavior * feat: improve typing * refactor: switch from jest to vitest for mocking in tests --------- Co-authored-by: Dixie Flatline <dflatline> Co-authored-by: Daniel Riccio <ricciodaniel98@gmail.com>
Related GitHub Issue
#4462
Closes: #4462 =
Description
This MR explicitly specifies base64 as encoding format to OpenAI library. This causes it to return the reponse directly, without subjecting it to dimensionality reduction sabotage or integer truncation (it does both).
We then parse it directly ourselves, preserving the correct embedding dimension without accuracy loss.
Test Procedure
In order to work around ollama/ollama#6262, you also need to kill your ollama demon and launch it manually from the shell like this:
OLLAMA_NUM_PARALLEL=1 OLLAMA_MAX_QUEUE=1000000 ollama serve
Type of Change
src
or test files.Pre-Submission Checklist
npm run lint
).console.log
) has been removed.npm test
).main
branch.npm run changeset
if this PR includes user-facing changes or dependency updates.Screenshots / Videos
Not needed.
Documentation Updates
Not really needed? Tho maybe we need to document ollama/ollama#6262 in the main docs generally for ollama embedding (that is the source of the
OLLAMA_NUM_PARALLEL=1 OLLAMA_MAX_QUEUE=1000000 ollama serve
workaround).Additional Notes
Get in Touch
Important
Specifies base64 encoding in
OpenAICompatibleEmbedder
to prevent dimensionality reduction and parses embeddings to preserve dimensions.base64
asencoding_format
inOpenAICompatibleEmbedder
to prevent dimensionality reduction and integer truncation.Float32Array
to preserve embedding dimensions._embedBatchWithRetries()
function.This description was created by
for 4b04411. You can customize this summary. It will automatically update as commits are pushed.