Skip to content

Conversation

coyotte508
Copy link
Member

@coyotte508 coyotte508 commented Sep 2, 2025

Fix #1705

// Edit a file by adding prefix & suffix
await commit({
  repo,
  accessToken: "hf_...",
  operations: [{
    type: "edit",
    originalContent: originalFile,
    edits: [{
      start: 0,
      end: 0,
      content: new Blob(["prefix"])
    }, {
      start: originalFile.length,
      end: originalFile.length,
      content: new Blob(["suffix"])
    }]
  }]
})
// Edit first kB of file
await commit({
  repo,
  accessToken: "hf_...",
  operations: [{
    type: "edit",
    originalContent: originalFile,
    edits: [{
      start: 0,
      end: 1000,
      content: new Blob(["blablabla"])
    }]
  }]
})

cc @mishig25 @assafvayner @jsulz

also

  • fallback to LFS for non-xet repos (even if useXet is true)
  • remove invalid Accepts header

How it works under the hood

  • we load dedup info for first chunk of original file content if it's changed
  • we upload the blob as normal

Todo

currently blob is being processed twice, once for sha256 and once for hashing. The file should be processed only once (maybe after #1704 - using workers for different processes)

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@coyotte508 coyotte508 marked this pull request as ready for review September 2, 2025 14:08
@coyotte508 coyotte508 merged commit bd3c428 into main Sep 2, 2025
4 of 6 checks passed
@coyotte508 coyotte508 deleted the edit-file-chunk branch September 2, 2025 15:58
coyotte508 added a commit to huggingface/xet-core that referenced this pull request Sep 6, 2025
…481)

Related to huggingface/huggingface.js#1718

We'll want to edit parts of file while loading old data's dedup info

In those case we don't always want to load dedup info for the first
chunk (since it may not be at the beginning of the file)

So the is_dedup = true for first chunk is handled client side
@mishig25 mishig25 self-requested a review September 8, 2025 07:56
mishig25 added a commit that referenced this pull request Sep 11, 2025
### Description 

This PR introduces function that serializes GGUF metadata/header into
Uint8Arrray so that I can use
#1718 to update gguf
metadata on hf.co

* `serializeTypedMetadata()` - Serialize GGUF metadata to binary format
* `serializeGgufHeader()` - Create complete GGUF headers with metadata +
tensor info + alignment
* Enhanced `gguf()` function - Now returns `littleEndian` property for
endianness detection

### Usage example

```ts
// Edit first kB of file
await commit({
  repo,
  accessToken: "hf_...",
  operations: [{
    type: "edit",
    originalContent: new Blob(original gguf header),
    edits: [{
      start: 0,
      end: 1000,
      content: new Blob(serializeGgufHeader(new gguf header with updated metadata))
    }]
  }]
})
```
mishig25 added a commit that referenced this pull request Sep 13, 2025
…oken and related functions for better handling of pull requests (#1746)

I was getting error when trying to create pull request on repos that I
do **not** own:
```
Error: Forbidden: pass `create_pr=1` as a query parameter to create a Pull Request. URL: https://huggingface.co/api/models/reach-vb/TinyLlama-1.1B-Chat-v1.0-q4_k_m-GGUF/xet-write-token/main
```

I was using new API from
#1718

```ts
// Edit first kB of file
await commit({
  repo,
  accessToken: "hf_...",
  operations: [{
    type: "edit",
    originalContent: originalFile,
    edits: [{
      start: 0,
      end: 1000,
      content: new Blob(["blablabla"])
    }]
  }]
})
```

Let me know if this PR is the right way to handle this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

xet upload: allow changing only the beginning of the file

2 participants