Skip to content

Conversation

coyotte508
Copy link
Member

@coyotte508 coyotte508 commented Sep 2, 2025

Related to huggingface/huggingface.js#1718

We'll want to edit parts of file while loading old data's dedup info

In those case we don't always want to load dedup info for the first chunk (since it may not be at the beginning of the file)

So the is_dedup = true for first chunk is handled client side

Copy link
Contributor

@assafvayner assafvayner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@assafvayner
Copy link
Contributor

So the is_dedup = true for first chunk is handled client side

Best to ensure this is still done, the best chance to dedupe large parts of files is on first chunk.

@coyotte508
Copy link
Member Author

@coyotte508 coyotte508 merged commit 3ff4eb2 into main Sep 6, 2025
6 checks passed
@coyotte508 coyotte508 deleted the dedup-first-is-client-side-logic branch September 6, 2025 07:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants