Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -344,6 +344,7 @@ You can refine your search by selecting the task you're interested in (e.g., [te
1. **[GPT-J](https://huggingface.co/docs/transformers/model_doc/gptj)** (from EleutherAI) released in the repository [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/) by Ben Wang and Aran Komatsuzaki.
1. **[GPTBigCode](https://huggingface.co/docs/transformers/model_doc/gpt_bigcode)** (from BigCode) released with the paper [SantaCoder: don't reach for the stars!](https://huggingface.co/papers/2301.03988) by Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Munoz Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey, Logesh Kumar Umapathi, Carolyn Jane Anderson, Yangtian Zi, Joel Lamy Poirier, Hailey Schoelkopf, Sergey Troshin, Dmitry Abulkhanov, Manuel Romero, Michael Lappert, Francesco De Toni, Bernardo García del Río, Qian Liu, Shamik Bose, Urvashi Bhattacharyya, Terry Yue Zhuo, Ian Yu, Paulo Villegas, Marco Zocca, Sourab Mangrulkar, David Lansky, Huu Nguyen, Danish Contractor, Luis Villa, Jia Li, Dzmitry Bahdanau, Yacine Jernite, Sean Hughes, Daniel Fried, Arjun Guha, Harm de Vries, Leandro von Werra.
1. **[Granite](https://huggingface.co/docs/transformers/main/model_doc/granite)** (from IBM) released with the paper [Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler](https://huggingface.co/papers/2408.13359) by Yikang Shen, Matthew Stallone, Mayank Mishra, Gaoyuan Zhang, Shawn Tan, Aditya Prasad, Adriana Meza Soria, David D. Cox, Rameswar Panda.
1. **[GraniteMoeHybrid](https://huggingface.co/docs/transformers/main/model_doc/granitemoehybrid)** (from IBM) released with the blog post [IBM Granite 4.0: hyper-efficient, high performance hybrid models for enterprise](https://www.ibm.com/new/announcements/ibm-granite-4-0-hyper-efficient-high-performance-hybrid-models) by the IBM Granite team.
1. **[Grounding DINO](https://huggingface.co/docs/transformers/model_doc/grounding-dino)** (from IDEA-Research) released with the paper [Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection](https://huggingface.co/papers/2303.05499) by Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Qing Jiang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, Lei Zhang.
1. **[GroupViT](https://huggingface.co/docs/transformers/model_doc/groupvit)** (from UCSD, NVIDIA) released with the paper [GroupViT: Semantic Segmentation Emerges from Text Supervision](https://huggingface.co/papers/2202.11094) by Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang.
1. **[Helium](https://huggingface.co/docs/transformers/main/model_doc/helium)** (from the Kyutai Team) released with the blog post [Announcing Helium-1 Preview](https://kyutai.org/2025/01/13/helium.html) by the Kyutai Team.
Expand Down
1 change: 1 addition & 0 deletions docs/snippets/6_supported-models.snippet
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@
1. **[GPT-J](https://huggingface.co/docs/transformers/model_doc/gptj)** (from EleutherAI) released in the repository [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/) by Ben Wang and Aran Komatsuzaki.
1. **[GPTBigCode](https://huggingface.co/docs/transformers/model_doc/gpt_bigcode)** (from BigCode) released with the paper [SantaCoder: don't reach for the stars!](https://huggingface.co/papers/2301.03988) by Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Munoz Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey, Logesh Kumar Umapathi, Carolyn Jane Anderson, Yangtian Zi, Joel Lamy Poirier, Hailey Schoelkopf, Sergey Troshin, Dmitry Abulkhanov, Manuel Romero, Michael Lappert, Francesco De Toni, Bernardo García del Río, Qian Liu, Shamik Bose, Urvashi Bhattacharyya, Terry Yue Zhuo, Ian Yu, Paulo Villegas, Marco Zocca, Sourab Mangrulkar, David Lansky, Huu Nguyen, Danish Contractor, Luis Villa, Jia Li, Dzmitry Bahdanau, Yacine Jernite, Sean Hughes, Daniel Fried, Arjun Guha, Harm de Vries, Leandro von Werra.
1. **[Granite](https://huggingface.co/docs/transformers/main/model_doc/granite)** (from IBM) released with the paper [Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler](https://huggingface.co/papers/2408.13359) by Yikang Shen, Matthew Stallone, Mayank Mishra, Gaoyuan Zhang, Shawn Tan, Aditya Prasad, Adriana Meza Soria, David D. Cox, Rameswar Panda.
1. **[GraniteMoeHybrid](https://huggingface.co/docs/transformers/main/model_doc/granitemoehybrid)** (from IBM) released with the blog post [IBM Granite 4.0: hyper-efficient, high performance hybrid models for enterprise](https://www.ibm.com/new/announcements/ibm-granite-4-0-hyper-efficient-high-performance-hybrid-models) by the IBM Granite team.
1. **[Grounding DINO](https://huggingface.co/docs/transformers/model_doc/grounding-dino)** (from IDEA-Research) released with the paper [Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection](https://huggingface.co/papers/2303.05499) by Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Qing Jiang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, Lei Zhang.
1. **[GroupViT](https://huggingface.co/docs/transformers/model_doc/groupvit)** (from UCSD, NVIDIA) released with the paper [GroupViT: Semantic Segmentation Emerges from Text Supervision](https://huggingface.co/papers/2202.11094) by Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang.
1. **[Helium](https://huggingface.co/docs/transformers/main/model_doc/helium)** (from the Kyutai Team) released with the blog post [Announcing Helium-1 Preview](https://kyutai.org/2025/01/13/helium.html) by the Kyutai Team.
Expand Down
1 change: 1 addition & 0 deletions src/configs.js
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,7 @@ function getNormalizedConfig(config) {
case 'olmo2':
case 'mobilellm':
case 'granite':
case 'granitemoehybrid':
case 'cohere':
case 'mistral':
case 'starcoder2':
Expand Down
8 changes: 8 additions & 0 deletions src/models.js
Original file line number Diff line number Diff line change
Expand Up @@ -4653,6 +4653,12 @@ export class GraniteModel extends GranitePreTrainedModel { }
export class GraniteForCausalLM extends GranitePreTrainedModel { }
//////////////////////////////////////////////////

//////////////////////////////////////////////////
// GraniteMoeHybrid models
export class GraniteMoeHybridPreTrainedModel extends PreTrainedModel { }
export class GraniteMoeHybridModel extends GraniteMoeHybridPreTrainedModel { }
export class GraniteMoeHybridForCausalLM extends GraniteMoeHybridPreTrainedModel { }
//////////////////////////////////////////////////

//////////////////////////////////////////////////
// Cohere models
Expand Down Expand Up @@ -7841,6 +7847,7 @@ const MODEL_MAPPING_NAMES_DECODER_ONLY = new Map([
['olmo2', ['Olmo2Model', Olmo2Model]],
['mobilellm', ['MobileLLMModel', MobileLLMModel]],
['granite', ['GraniteModel', GraniteModel]],
['granitemoehybrid', ['GraniteMoeHybridModel', GraniteMoeHybridModel]],
['cohere', ['CohereModel', CohereModel]],
['gemma', ['GemmaModel', GemmaModel]],
['gemma2', ['Gemma2Model', Gemma2Model]],
Expand Down Expand Up @@ -7951,6 +7958,7 @@ const MODEL_FOR_CAUSAL_LM_MAPPING_NAMES = new Map([
['olmo2', ['Olmo2ForCausalLM', Olmo2ForCausalLM]],
['mobilellm', ['MobileLLMForCausalLM', MobileLLMForCausalLM]],
['granite', ['GraniteForCausalLM', GraniteForCausalLM]],
['granitemoehybrid', ['GraniteMoeHybridForCausalLM', GraniteMoeHybridForCausalLM]],
['cohere', ['CohereForCausalLM', CohereForCausalLM]],
['gemma', ['GemmaForCausalLM', GemmaForCausalLM]],
['gemma2', ['Gemma2ForCausalLM', Gemma2ForCausalLM]],
Expand Down