-
Notifications
You must be signed in to change notification settings - Fork 619
Insights: pytorch/torchtune
Overview
Could not load contribution data
Please try again later
8 Pull requests merged by 6 people
-
Added evaluation config under llama3_1 dir, updated registry
#2763 merged
Jun 2, 2025 -
Update valid torch nightly backends
#2774 merged
Jun 1, 2025 -
Improve torch.compile log message in compile_model
#2777 merged
Jun 1, 2025 -
[compile] Fix graphbreaks in moe split; scale_grad
#2771 merged
May 30, 2025 -
Remove code_llama2
#2770 merged
May 30, 2025 -
1M+ context length (context parallel integration)
#2668 merged
May 30, 2025 -
[compile] Fix compile producing NaNs
#2765 merged
May 28, 2025 -
Update yaml configs. Add validation dataset
#2608 merged
May 28, 2025
4 Pull requests opened by 4 people
-
[VERY WIP] DSV3
#2764 opened
May 27, 2025 -
[NOT FOR REVIEW] Full knowledge distillation recipe TP + FP8
#2767 opened
May 28, 2025 -
Fix typing in _grad_scaler.py for nightly compatibility
#2778 opened
Jun 2, 2025 -
Full lora recipy simplification
#2780 opened
Jun 2, 2025
4 Issues closed by 3 people
-
Improve `compile_model` logging output
#2717 closed
Jun 1, 2025 -
Could you recommend evaluation benchmark for LLM instruction-tuning? (with alpaca or slimorca dataset)
#2740 closed
May 30, 2025 -
Deprecate code_llama2
#2768 closed
May 30, 2025 -
Generation does not work
#2769 closed
May 29, 2025
4 Issues opened by 4 people
-
Proposal: reuse methods in recipes
#2779 opened
Jun 2, 2025 -
Loss is `nan` while training LLama4 via LoRA using torchtune
#2776 opened
Jun 1, 2025 -
Torchtune dataset cache memory error despite providing dataset file path
#2775 opened
May 30, 2025 -
pip install torchtune , version is 0.61 which does not contrain qwen3
#2772 opened
May 30, 2025
16 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
[llama4] use grouped_mm in moe for sm90
#2755 commented on
Jun 2, 2025 • 17 new comments -
I'm experimenting unexpected loss values when running old recipes on the same datasets
#2746 commented on
May 27, 2025 • 0 new comments -
Running into a torch.distributed.elastic.rendezvous.api.RendezvousConnectionError: The connection to the C10d store has failed.
#2676 commented on
May 28, 2025 • 0 new comments -
TP + FP8 - NotImplementedError for certain operations
#2629 commented on
May 30, 2025 • 0 new comments -
TP + FP8 + Compile: metadata error
#2682 commented on
May 30, 2025 • 0 new comments -
`torchtune` fails to start when the checkpointer dir does not contain the initial model weights
#2759 commented on
May 30, 2025 • 0 new comments -
Feature request: support different input/output formats in the same recipe
#2732 commented on
May 30, 2025 • 0 new comments -
Perf / accuracy metrics comparison with Nemo for SFT / reasoning distillation scenario
#2760 commented on
May 30, 2025 • 0 new comments -
Implement step based checkpointing
#2384 commented on
Jun 2, 2025 • 0 new comments -
GRPO LoRA Single Device
#2467 commented on
May 31, 2025 • 0 new comments -
[WIP] Gemma3 support.
#2485 commented on
May 28, 2025 • 0 new comments -
Adding EOS Tokens to Qwen Models
#2512 commented on
May 27, 2025 • 0 new comments -
Eval chat template
#2574 commented on
May 28, 2025 • 0 new comments -
[WIP] HuggingFaceModelTokenizer
#2723 commented on
Jun 2, 2025 • 0 new comments -
Add feature ligerceloss
#2741 commented on
May 30, 2025 • 0 new comments -
Add `LRScheduler.state_dict()` to checkpoints
#2762 commented on
Jun 2, 2025 • 0 new comments