Skip to content

Chunk text along UTF-8 boundaries #620

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 8, 2025

Conversation

ladvoc
Copy link
Contributor

@ladvoc ladvoc commented Mar 7, 2025

This PR addresses a limitation in the current data streams implementation as noted by @lukasIO where text is not chunked along UTF-8 boundaries.

Copy link

ilo-nanpa bot commented Mar 7, 2025

it seems like you haven't added any nanpa changeset files to this PR.

if this pull request includes changes to code, make sure to add a changeset, by writing a file to .nanpa/<unique-name>.kdl:

minor type="added" "Introduce frobnication algorithm"

refer to the manpage for more information.

@ladvoc ladvoc force-pushed the ladvoc/text-chunking branch from 27870fe to af371aa Compare March 7, 2025 20:01
@ladvoc ladvoc requested a review from hiroshihorie March 8, 2025 03:14
Copy link
Member

@hiroshihorie hiroshihorie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ✅

@hiroshihorie hiroshihorie merged commit 5b49e07 into livekit:main Mar 8, 2025
14 of 16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants