router: send the input as chunks to the backend #1981

danieldk · 2024-05-30T12:04:48Z

What does this PR do?

Inputs are currently sent to the backend as a single string, encoding images as Base64 and packing them in Markdown-style links. We would like to switch to chunked inputs, with different chunk types corresponding to different modalities. This reduces the amount of input-parsing that needs to be done and increases safety by properly typing inputs. This change can be broken up in various steps:

Update the router to send chunks along with 'stringly-typed' inputs.
Update the backend(s) to switch to processing chunks rather than strings.
Update the router to directly use chunked inputs for APIs that support submitting chunks (e.g. OpenAI), rather than first joining chunks and splitting them again in the input preparation.
Deprecate stringly-typed inputs at some point, by removing support from the router.

This change implements the first step, adding a new chunked input representation that separates text chunks from images chunks. Image chunks contain binary data (for smaller message sizes) and the image's MIME type.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Narsil

Very nice.

This PR doesn't touch shard side, I'm guessing you're leaving that as a follow-up ?

router/client/src/client.rs

router/src/validation.rs

router/client/src/lib.rs

router/client/src/client.rs

danieldk · 2024-05-31T12:04:45Z

This PR doesn't touch shard side, I'm guessing you're leaving that as a follow-up ?

Yeah #1985 will do that.

danieldk · 2024-05-31T13:10:10Z

router/client/src/lib.rs

+            // We don't create empty chunks, probably better to be robust
+            // than unreachable!().
+            None => {}


Just wanted to make sure you see this. Since we are creating the chunks in the router, this case should never occur. But we could also have unreachable!() or bubble up an error. I don't have a strong opinion.

Either is fine.
Seems to be an artifact of the protobuf definition.

unreachable makes thing hard crash so easier to debug, but less robust

do nothing, harder to debug should anything go wrong (if even visible) but more robust.

I don't a strong opinion to be honest. Insinctively I'd make it a unreachable or equivalent, at the very least in the silent path I would add a warn log.

Narsil

LGTM

Before this change, the generation input was sent to the backend as a single string, encoding images as Base64 and packing them in Markdown-style links. This change adds a new chunked input representation that separates text chunks from images chunks. Image chunks contain binary data (for smaller message sizes) and the image's MIME type. The stringly-typed inputs are still sent to support backends that do not support chunked inputs yet.

danieldk force-pushed the feature/chunked-input branch from 78bb569 to 99d4c9e Compare May 30, 2024 12:31

danieldk marked this pull request as ready for review May 30, 2024 15:09

danieldk mentioned this pull request May 31, 2024

server: use chunked inputs #1985

Merged

5 tasks

Narsil reviewed May 31, 2024

View reviewed changes

router/client/src/client.rs Outdated Show resolved Hide resolved

router/src/validation.rs Outdated Show resolved Hide resolved

router/client/src/lib.rs Outdated Show resolved Hide resolved

router/client/src/client.rs Outdated Show resolved Hide resolved

danieldk force-pushed the feature/chunked-input branch from 99d4c9e to 6c5598e Compare May 31, 2024 13:04

danieldk commented May 31, 2024

View reviewed changes

danieldk requested a review from Narsil May 31, 2024 13:36

danieldk force-pushed the feature/chunked-input branch from 6c5598e to fc52ba6 Compare June 3, 2024 07:27

Narsil previously approved these changes Jun 3, 2024

View reviewed changes

danieldk dismissed Narsil’s stale review via f92411a June 3, 2024 14:29

danieldk force-pushed the feature/chunked-input branch from fc52ba6 to f92411a Compare June 3, 2024 14:29

danieldk merged commit df71aaf into main Jun 3, 2024
5 checks passed

danieldk deleted the feature/chunked-input branch June 3, 2024 15:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

router: send the input as chunks to the backend #1981

router: send the input as chunks to the backend #1981

danieldk commented May 30, 2024

Narsil left a comment

danieldk commented May 31, 2024

danieldk May 31, 2024 •

edited

Loading

Narsil Jun 3, 2024

Narsil left a comment

router: send the input as chunks to the backend #1981

router: send the input as chunks to the backend #1981

Conversation

danieldk commented May 30, 2024

What does this PR do?

Before submitting

Who can review?

Narsil left a comment

Choose a reason for hiding this comment

danieldk commented May 31, 2024

danieldk May 31, 2024 • edited Loading

Choose a reason for hiding this comment

Narsil Jun 3, 2024

Choose a reason for hiding this comment

Narsil left a comment

Choose a reason for hiding this comment

danieldk May 31, 2024 •

edited

Loading