Skip to content

Conversation

jackzhxng
Copy link
Collaborator

@jackzhxng jackzhxng commented Aug 1, 2025

Summary

Pin bumps

  • Bump torch libraries to more recent versions, since nightlies are only hosted for the last two months and the oldest nightly we can use is now 20250601
  • Bump transformers to 4.54.1
  • Bump torchao

Code changes

Includes changes to absorb the huggingface/transformers#39106 kv cache refactor introducewd by the transformers upgrade, which specifies kv cache attributes per layer. cache_config is also no longer a CacheConfig instance but a dict after this PR, so we change to using .get()

Infra changes

Remove mac tests, see #122 for more details. This also allows us to iterate more quickly by cutting down unnecessary CI, since there's technically no need to run on Mac to test export when Linux tests already cover that. Mac tests with larger runners are enabled reciprocally for major LLM models in ExecuTorch in pytorch/executorch#13400.

Known failures

  • T5
  • Whisper
  • Granite - runs out of disk space

@jackzhxng jackzhxng force-pushed the jz/bump-transformers branch from 89ed1c5 to 3a960bb Compare August 1, 2025 00:55
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@jackzhxng jackzhxng changed the title Bump transformers to 4.54.1 Bump transformers and torch Aug 1, 2025
@jackzhxng jackzhxng changed the title Bump transformers and torch [WIP] Bump transformers and torch Aug 1, 2025
@jackzhxng jackzhxng requested a review from guangy10 August 1, 2025 19:33
@jackzhxng jackzhxng changed the title [WIP] Bump transformers and torch Bump transformers and torch Aug 4, 2025
@jackzhxng jackzhxng force-pushed the jz/bump-transformers branch 2 times, most recently from 867eb8b to 300ccdf Compare August 4, 2025 19:12
@jackzhxng jackzhxng force-pushed the jz/bump-transformers branch from 300ccdf to bc82841 Compare August 4, 2025 19:58
@guangy10 guangy10 mentioned this pull request Aug 4, 2025
@jackzhxng jackzhxng force-pushed the jz/bump-transformers branch from d89e18d to 6a26464 Compare August 5, 2025 00:19
@jackzhxng jackzhxng requested a review from kimishpatel August 6, 2025 21:34

# Create a list of CustomKVCache instances, one per layer
self.kv_cache = torch.nn.ModuleList()
for _ in range(config.num_hidden_layers):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happened here? like config doesnt exist anymore?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It still exists, feel like it's more idiomatic to iterate over the actual layers

@guangy10 guangy10 mentioned this pull request Aug 7, 2025
@jackzhxng jackzhxng force-pushed the jz/bump-transformers branch from 93cbd54 to 64b41b4 Compare August 8, 2025 21:46
@jackzhxng jackzhxng force-pushed the jz/bump-transformers branch from b0027f5 to eccf6f0 Compare August 10, 2025 20:56
@jackzhxng jackzhxng force-pushed the jz/bump-transformers branch from eccf6f0 to 87fe6e3 Compare August 10, 2025 21:56
@jackzhxng jackzhxng force-pushed the jz/bump-transformers branch from 87fe6e3 to 99805f8 Compare August 10, 2025 22:00
@jackzhxng jackzhxng force-pushed the jz/bump-transformers branch from fc7b69e to 59778eb Compare August 14, 2025 09:04
self._temp_dir = None

def __del__(self):
"""Clean up temporary files when the model instance is destroyed."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldnt this already happen automatically?

Copy link
Collaborator Author

@jackzhxng jackzhxng Aug 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah probably, but added just to be extra sure that it's cleaned up between tests

@jackzhxng jackzhxng force-pushed the jz/bump-transformers branch from 330ca8d to b252038 Compare August 15, 2025 16:22
n_heads=self.num_key_value_heads,
head_dim=self.head_dim,
max_batch_size=layer.max_batch_size,
max_context_length=layer.max_cache_len,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait what is happening here? is this same as sliding_window_len

Copy link
Collaborator Author

@jackzhxng jackzhxng Aug 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackzhxng jackzhxng merged commit aae1dc7 into huggingface:main Aug 18, 2025
67 of 79 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants