-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
router: send the input as chunks to the backend #1981
Conversation
78bb569
to
99d4c9e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice.
This PR doesn't touch shard side, I'm guessing you're leaving that as a follow-up ?
Yeah #1985 will do that. |
99d4c9e
to
6c5598e
Compare
router/client/src/lib.rs
Outdated
// We don't create empty chunks, probably better to be robust | ||
// than unreachable!(). | ||
None => {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wanted to make sure you see this. Since we are creating the chunks in the router, this case should never occur. But we could also have unreachable!()
or bubble up an error. I don't have a strong opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either is fine.
Seems to be an artifact of the protobuf definition.
unreachable
makes thing hard crash so easier to debug, but less robust- do nothing, harder to debug should anything go wrong (if even visible) but more robust.
I don't a strong opinion to be honest. Insinctively I'd make it a unreachable
or equivalent, at the very least in the silent path I would add a warn log.
6c5598e
to
fc52ba6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Before this change, the generation input was sent to the backend as a single string, encoding images as Base64 and packing them in Markdown-style links. This change adds a new chunked input representation that separates text chunks from images chunks. Image chunks contain binary data (for smaller message sizes) and the image's MIME type. The stringly-typed inputs are still sent to support backends that do not support chunked inputs yet.
fc52ba6
to
f92411a
Compare
What does this PR do?
Inputs are currently sent to the backend as a single string, encoding images as Base64 and packing them in Markdown-style links. We would like to switch to chunked inputs, with different chunk types corresponding to different modalities. This reduces the amount of input-parsing that needs to be done and increases safety by properly typing inputs. This change can be broken up in various steps:
This change implements the first step, adding a new chunked input representation that separates text chunks from images chunks. Image chunks contain binary data (for smaller message sizes) and the image's MIME type.
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.