Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

router: send the input as chunks to the backend #1981

Merged
merged 1 commit into from
Jun 3, 2024

Conversation

danieldk
Copy link
Member

What does this PR do?

Inputs are currently sent to the backend as a single string, encoding images as Base64 and packing them in Markdown-style links. We would like to switch to chunked inputs, with different chunk types corresponding to different modalities. This reduces the amount of input-parsing that needs to be done and increases safety by properly typing inputs. This change can be broken up in various steps:

  1. Update the router to send chunks along with 'stringly-typed' inputs.
  2. Update the backend(s) to switch to processing chunks rather than strings.
  3. Update the router to directly use chunked inputs for APIs that support submitting chunks (e.g. OpenAI), rather than first joining chunks and splitting them again in the input preparation.
  4. Deprecate stringly-typed inputs at some point, by removing support from the router.

This change implements the first step, adding a new chunked input representation that separates text chunks from images chunks. Image chunks contain binary data (for smaller message sizes) and the image's MIME type.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@danieldk danieldk force-pushed the feature/chunked-input branch from 78bb569 to 99d4c9e Compare May 30, 2024 12:31
@danieldk danieldk marked this pull request as ready for review May 30, 2024 15:09
@danieldk danieldk mentioned this pull request May 31, 2024
5 tasks
Copy link
Collaborator

@Narsil Narsil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice.

This PR doesn't touch shard side, I'm guessing you're leaving that as a follow-up ?

router/client/src/client.rs Outdated Show resolved Hide resolved
router/src/validation.rs Outdated Show resolved Hide resolved
router/client/src/lib.rs Outdated Show resolved Hide resolved
router/client/src/client.rs Outdated Show resolved Hide resolved
@danieldk
Copy link
Member Author

This PR doesn't touch shard side, I'm guessing you're leaving that as a follow-up ?

Yeah #1985 will do that.

@danieldk danieldk force-pushed the feature/chunked-input branch from 99d4c9e to 6c5598e Compare May 31, 2024 13:04
Comment on lines 74 to 76
// We don't create empty chunks, probably better to be robust
// than unreachable!().
None => {}
Copy link
Member Author

@danieldk danieldk May 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to make sure you see this. Since we are creating the chunks in the router, this case should never occur. But we could also have unreachable!() or bubble up an error. I don't have a strong opinion.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either is fine.
Seems to be an artifact of the protobuf definition.

  • unreachable makes thing hard crash so easier to debug, but less robust
  • do nothing, harder to debug should anything go wrong (if even visible) but more robust.

I don't a strong opinion to be honest. Insinctively I'd make it a unreachable or equivalent, at the very least in the silent path I would add a warn log.

@danieldk danieldk requested a review from Narsil May 31, 2024 13:36
@danieldk danieldk force-pushed the feature/chunked-input branch from 6c5598e to fc52ba6 Compare June 3, 2024 07:27
Narsil
Narsil previously approved these changes Jun 3, 2024
Copy link
Collaborator

@Narsil Narsil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Before this change, the generation input was sent to the backend as a
single string, encoding images as Base64 and packing them in
Markdown-style links.

This change adds a new chunked input representation that separates text
chunks from images chunks. Image chunks contain binary data (for smaller
message sizes) and the image's MIME type.

The stringly-typed inputs are still sent to support backends that do not
support chunked inputs yet.
@danieldk danieldk force-pushed the feature/chunked-input branch from fc52ba6 to f92411a Compare June 3, 2024 14:29
@danieldk danieldk merged commit df71aaf into main Jun 3, 2024
5 checks passed
@danieldk danieldk deleted the feature/chunked-input branch June 3, 2024 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants