The goal of the MindSpore Transformers suite is to build a full-process development suite for Large model pre-training, fine-tuning, evaluation, inference, and deployment. It provides mainstream Transformer-based Large Language Models (LLMs) and Multimodal Models (MMs). It is expected to help users easily realize the full process of large model development.
Based on MindSpore's built-in parallel technology and component-based design, the MindSpore Transformers suite has the following features:
- One-click initiation of single or multi card pre-training, fine-tuning, evaluation, inference, and deployment processes for large models;
- Provides rich multi-dimensional hybrid parallel capabilities for flexible and easy-to-use personalized configuration;
- System-level deep optimization on large model training and inference, native support for ultra-large-scale cluster efficient training and inference, rapid fault recovery;
- Support for configurable development of task components. Any module can be enabled by unified configuration, including model network, optimizer, learning rate policy, etc.;
- Provide real-time visualization of training accuracy/performance monitoring indicators.
For details about MindSpore Transformers tutorials and API documents, see MindSpore Transformers Documentation. The following are quick jump links to some of the key content:
- π Quick Launch
- π Pre-training
- π Fine-Tuning
- π Evaluation
- π Service-oriented Deployment
If you have any suggestions on MindSpore Transformers, contact us through an issue, and we will address it promptly.
The following table lists models supported by MindSpore Transformers.
Model | Specifications | Model Type | Latest Version |
---|---|---|---|
DeepSeek-V3 | 671B | Sparse LLM | In-development version, 1.5.0 |
GLM4 | 9B | Dense LLM | In-development version, 1.5.0 |
Llama3.1 | 8B/70B | Dense LLM | In-development version, 1.5.0 |
Qwen2.5 | 0.5B/1.5B/7B/14B/32B/72B | Dense LLM | In-development version, 1.5.0 |
TeleChat2 | 7B/35B/115B | Dense LLM | In-development version, 1.5.0 |
CodeLlama | 34B | Dense LLM | 1.5.0 |
CogVLM2-Image | 19B | MM | 1.5.0 |
CogVLM2-Video | 13B | MM | 1.5.0 |
DeepSeek-V2 | 236B | Sparse LLM | 1.5.0 |
DeepSeek-Coder-V1.5 | 7B | Dense LLM | 1.5.0 |
DeepSeek-Coder | 33B | Dense LLM | 1.5.0 |
GLM3-32K | 6B | Dense LLM | 1.5.0 |
GLM3 | 6B | Dense LLM | 1.5.0 |
InternLM2 | 7B/20B | Dense LLM | 1.5.0 |
Llama3.2 | 3B | Dense LLM | 1.5.0 |
Llama3.2-Vision | 11B | MM | 1.5.0 |
Llama3 | 8B/70B | Dense LLM | 1.5.0 |
Llama2 | 7B/13B/70B | Dense LLM | 1.5.0 |
Mixtral | 8x7B | Sparse LLM | 1.5.0 |
Qwen2 | 0.5B/1.5B/7B/57B/57B-A14B/72B | Dense/Sparse LLM | 1.5.0 |
Qwen1.5 | 7B/14B/72B | Dense LLM | 1.5.0 |
Qwen-VL | 9.6B | MM | 1.5.0 |
TeleChat | 7B/12B/52B | Dense LLM | 1.5.0 |
Whisper | 1.5B | MM | 1.5.0 |
Yi | 6B/34B | Dense LLM | 1.5.0 |
YiZhao | 12B | Dense LLM | 1.5.0 |
Baichuan2 | 7B/13B | Dense LLM | 1.3.2 |
GLM2 | 6B | Dense LLM | 1.3.2 |
GPT2 | 124M/13B | Dense LLM | 1.3.2 |
InternLM | 7B/20B | Dense LLM | 1.3.2 |
Qwen | 7B/14B | Dense LLM | 1.3.2 |
CodeGeex2 | 6B | Dense LLM | 1.1.0 |
WizardCoder | 15B | Dense LLM | 1.1.0 |
Baichuan | 7B/13B | Dense LLM | 1.0 |
Blip2 | 8.1B | MM | 1.0 |
Bloom | 560M/7.1B/65B/176B | Dense LLM | 1.0 |
Clip | 149M/428M | MM | 1.0 |
CodeGeex | 13B | Dense LLM | 1.0 |
GLM | 6B | Dense LLM | 1.0 |
iFlytekSpark | 13B | Dense LLM | 1.0 |
Llama | 7B/13B | Dense LLM | 1.0 |
MAE | 86M | MM | 1.0 |
Mengzi3 | 13B | Dense LLM | 1.0 |
PanguAlpha | 2.6B/13B | Dense LLM | 1.0 |
SAM | 91M/308M/636M | MM | 1.0 |
Skywork | 13B | Dense LLM | 1.0 |
Swin | 88M | MM | 1.0 |
T5 | 14M/60M | Dense LLM | 1.0 |
VisualGLM | 6B | MM | 1.0 |
Ziya | 13B | Dense LLM | 1.0 |
Bert | 4M/110M | Dense LLM | 0.8 |
The model maintenance strategy follows the Life Cycle And Version Matching Strategy of the corresponding latest supported version.
Currently, the Atlas 800T A2 training server is supported.
Python 3.11.4 is recommended for the current suite.
MindSpore Transformers | MindSpore | CANN | Driver/Firmware | Image Link |
---|---|---|---|---|
In-development version | In-development version | In-development version | In-development version | Not involved |
Historical Version Supporting Relationships:
MindSpore Transformers | MindSpore | CANN | Driver/Firmware | Image Link |
---|---|---|---|---|
1.5.0 | 2.6.0-rc1 | 8.1.RC1 | 25.0.RC1 | Link |
1.3.2 | 2.4.10 | 8.0.0 | 24.1.0 | Link |
1.3.0 | 2.4.0 | 8.0.RC3 | 24.1.RC3 | Link |
1.2.0 | 2.3.0 | 8.0.RC2 | 24.1.RC2 | Link |
Currently, MindSpore Transformers can be compiled and installed using the source code. You can run the following commands to install MindSpore Transformers:
git clone -b dev https://gitee.com/mindspore/mindformers.git
cd mindformers
bash build.sh
MindSpore Transformers supports distributed pre-training, fine-tuning, and inference tasks for large models with one click. You can click the link of each model in Model List to see the corresponding documentation, and you can also refer to Start Tasks to learn how to start the above tasks.
For more information about the functions of MindSpore Transformers, please refer to MindSpore Transformers Documentation.
MindSpore Transformers version has the following five maintenance phases:
Status | Duration | Description |
---|---|---|
Plan | 1-3 months | Planning function. |
Develop | 3 months | Build function. |
Preserve | 6 months | Incorporate all solved problems and release new versions. |
No Preserve | 0β3 months | Incorporate all the solved problems, there is no full-time maintenance team, and there is no plan to release a new version. |
End of Life (EOL) | N/A | The branch is closed and no longer accepts any modifications. |
MindSpore Transformers released version preservation policy:
MindSpore Transformers Version | Corresponding Label | Current Status | Release Time | Subsequent Status | EOL Date |
---|---|---|---|---|---|
1.5.0 | v1.5.0 | Preserve | 2025/04/29 | No preserve expected from 2025/10/29 | 2026/01/29 |
1.3.2 | v1.3.2 | Preserve | 2024/12/20 | No preserve expected from 2025/06/20 | 2025/09/20 |
1.2.0 | v1.2.0 | End of Life | 2024/07/12 | - | 2025/04/12 |
1.1.0 | v1.1.0 | End of Life | 2024/04/15 | - | 2025/01/15 |
scripts/examples
directory are provided as reference examples and do not form part of the commercially released products. They are only for users' reference. If it needs to be used, the user should be responsible for transforming it into a product suitable for commercial use and ensuring security protection. MindSpore does not assume security responsibility for the resulting security problems.- With regard to datasets, MindSpore Transformers only suggests datasets that can be used for training. MindSpore Transformers does not provide any datasets. If you use these datasets for training, please note that you should comply with the licenses of the corresponding datasets, and that MindSpore Transformers is not responsible for any infringement disputes that may arise from the use of the datasets.
- If you do not want your dataset to be mentioned in MindSpore Transformers, or if you want to update the description of your dataset in MindSpore Transformers, please submit an issue to Gitee, and we will remove or update the description of your dataset according to your issue request. We sincerely appreciate your understanding and contribution to MindSpore Transformers.
We welcome contributions to the community. For details, see MindSpore Transformers Contribution Guidelines.