Skip to content

DeepSparkInference has selected 216 inference models of both small and large sizes. The small models cover fields such as computer vision, natural language processing, and speech recognition; the LLMs involve various frameworks including vLLM, TGI and LMDeploy. This repository is the mirror of Gitee.

License

Notifications You must be signed in to change notification settings

Deep-Spark/DeepSparkInference

Repository files navigation

English Chinese

DeepSparkInference

Homepage LICENSE Release

DeepSparkInference推理模型库作为 DeepSpark 开源社区的核心项目,于 2024 年 3 月正式开源,一期甄选了 48 个推理模型示例,涵盖计算机视觉,自然语言处理,语音识别等领域,后续将逐步拓展更多 AI 领域。

DeepSparkInference中的模型提供了在国产推理引擎IGIEixRT下运行的推理示例和指导文档,部分模型提供了基于国产通用 GPU 智铠 100 的评测结果。

IGIE(Iluvatar GPU Inference Engine)是基于 TVM 框架研发的高性能、高通用、全流程的 AI 推理引擎。支持多框架模型导入、量化、图优化、多算子库支持、多后端支持、算子自动调优等特性,为推理场景提供易部署、高吞吐量、低延迟的完整方案。

ixRT(Iluvatar CoreX RunTime)是天数智芯自研的高性能推理引擎,专注于最大限度发挥天数智芯通用 GPU 的性能,实现各领域模型的高性能推理。ixRT支持动态形状推理、插件和 INT8/FP16 推理等特性。

DeepSparkInference将按季度进行版本更新,后续会逐步丰富模型类别并拓展大模型推理。

模型库

大语言模型(LLM)

Model Engine Supported IXUCA SDK
Baichuan2-7B vLLM 4.3.0
ChatGLM-3-6B vLLM 4.3.0
ChatGLM-3-6B-32K vLLM 4.3.0
CosyVoice2-0.5B PyTorch 4.3.0
CosyVoice2-0.5B ixRT dev-only
DeepSeek-R1-Distill-Llama-8B vLLM 4.3.0
DeepSeek-R1-Distill-Llama-70B vLLM 4.3.0
DeepSeek-R1-Distill-Qwen-1.5B vLLM 4.3.0
DeepSeek-R1-Distill-Qwen-7B vLLM 4.3.0
DeepSeek-R1-Distill-Qwen-14B vLLM 4.3.0
DeepSeek-R1-Distill-Qwen-32B vLLM 4.3.0
DeepSeek-OCR Transformers 4.3.0
DeepSeek-OCR vLLM dev-only
ERNIE-4.5-21B-A3B FastDeploy 4.3.0
ERNIE-4.5-300B-A47B FastDeploy 4.3.0
GLM-4V vLLM 4.3.0
InternLM3 LMDeploy 4.3.0
Llama2-7B vLLM 4.3.0
Llama2-7B TRT-LLM 4.3.0
Llama2-13B TRT-LLM 4.3.0
Llama2-70B TRT-LLM 4.3.0
Llama3-70B vLLM 4.3.0
E5-V vLLM 4.3.0
MiniCPM-o-2 vLLM 4.3.0
MiniCPM-V-2 vLLM 4.3.0
MiniCPM-V-4 vLLM dev-only
NVLM vLLM 4.3.0
Phi3_v vLLM 4.3.0
PaliGemma vLLM 4.3.0
Qwen-7B vLLM 4.3.0
Qwen-VL vLLM 4.3.0
Qwen2-VL vLLM 4.3.0
Qwen2.5-VL vLLM 4.3.0
Qwen1.5-7B vLLM 4.3.0
Qwen1.5-7B TGI 4.3.0
Qwen1.5-14B vLLM 4.3.0
Qwen1.5-32B Chat vLLM 4.3.0
Qwen1.5-72B vLLM 4.3.0
Qwen2-7B Instruct vLLM 4.3.0
Qwen2-72B Instruct vLLM 4.3.0
Qwen3_Moe vLLM dev-only
StableLM2-1.6B vLLM 4.3.0
Step3 vLLM dev-only
Ultravox vLLM 4.3.0
Whisper vLLM 4.3.0
XLMRoberta vLLM 4.3.0

计算机视觉(CV)

视觉分类

Model Prec. IGIE ixRT IXUCA SDK
AlexNet FP16 4.3.0
INT8 4.3.0
CLIP FP16 4.3.0
Conformer-B FP16 4.3.0
ConvNeXt-Base FP16 4.3.0
ConvNext-S FP16 4.3.0
ConvNeXt-Small FP16 4.3.0
ConvNeXt-Tiny FP16 4.3.0
CSPDarkNet53 FP16 4.3.0
INT8 4.3.0
CSPResNet50 FP16 4.3.0
INT8 4.3.0
CSPResNeXt50 FP16 4.3.0
DeiT-tiny FP16 4.3.0
DenseNet121 FP16 4.3.0
DenseNet161 FP16 4.3.0
DenseNet169 FP16 4.3.0
DenseNet201 FP16 4.3.0
EfficientNet-B0 FP16 4.3.0
INT8 4.3.0
EfficientNet-B1 FP16 4.3.0
INT8 4.3.0
EfficientNet-B2 FP16 4.3.0
EfficientNet-B3 FP16 4.3.0
EfficientNet-B4 FP16 4.3.0
EfficientNet-B5 FP16 4.3.0
EfficientNet-B6 FP16 4.3.0
EfficientNet-B7 FP16 4.3.0
EfficientNetV2 FP16 4.3.0
INT8 4.3.0
EfficientNetv2_rw_t FP16 4.3.0
EfficientNetv2_s FP16 4.3.0
GoogLeNet FP16 4.3.0
INT8 4.3.0
HRNet-W18 FP16 4.3.0
INT8 4.3.0
InceptionV3 FP16 4.3.0
INT8 4.3.0
Inception-ResNet-V2 FP16 4.3.0
INT8 4.3.0
Mixer_B FP16 4.3.0
MNASNet0_5 FP16 4.3.0
MNASNet0_75 FP16 4.3.0
MNASNet1_0 FP16 4.3.0
MNASNet1_3 FP16 4.3.0
MobileNetV2 FP16 4.3.0
INT8 4.3.0
MobileNetV3_Large FP16 4.3.0
MobileNetV3_Small FP16 4.3.0
MViTv2_base FP16 dev-only
RegNet_x_16gf FP16 4.3.0
RegNet_x_1_6gf FP16 4.3.0
RegNet_x_3_2gf FP16 4.3.0
RegNet_x_8gf FP16 4.3.0
RegNet_x_32gf FP16 4.3.0
RegNet_x_400mf FP16 4.3.0
RegNet_x_800mf FP16 4.3.0
RegNet_y_1_6gf FP16 4.3.0
RegNet_y_16gf FP16 4.3.0
RegNet_y_3_2gf FP16 4.3.0
RegNet_y_32gf FP16 4.3.0
RegNet_y_400mf FP16 4.3.0
RepVGG FP16 4.3.0
Res2Net50 FP16 4.3.0
INT8 4.3.0
ResNeSt50 FP16 4.3.0
ResNet101 FP16 4.3.0
INT8 4.3.0
ResNet152 FP16 4.3.0
INT8 4.3.0
ResNet18 FP16 4.3.0
INT8 4.3.0
ResNet34 FP16 4.3.0
INT8 4.3.0
ResNet50 FP16 4.3.0
INT8 4.3.0
ResNetV1D50 FP16 4.3.0
INT8 4.3.0
ResNeXt50_32x4d FP16 4.3.0
ResNeXt101_64x4d FP16 4.3.0
ResNeXt101_32x8d FP16 4.3.0
SEResNet50 FP16 4.3.0
ShuffleNetV1 FP16 4.3.0
ShuffleNetV2_x0_5 FP16 4.3.0
ShuffleNetV2_x1_0 FP16 4.3.0
ShuffleNetV2_x1_5 FP16 4.3.0
ShuffleNetV2_x2_0 FP16 4.3.0
SqueezeNet 1.0 FP16 4.3.0
INT8 4.3.0
SqueezeNet 1.1 FP16 4.3.0
INT8 4.3.0
SVT Base FP16 4.3.0
Swin Transformer FP16 4.3.0
Swin Transformer Large FP16 4.3.0
Twins_PCPVT FP16 4.3.0
VAN_B0 FP16 4.3.0
VGG11 FP16 4.3.0
VGG13 FP16 4.3.0
VGG13_BN FP16 4.3.0
VGG16 FP16 4.3.0
INT8 4.3.0
VGG19 FP16 4.3.0
VGG19_BN FP16 4.3.0
ViT FP16 4.3.0
Wide ResNet50 FP16 4.3.0
INT8 4.3.0
Wide ResNet101 FP16 4.3.0

目标检测

Model Prec. IGIE ixRT IXUCA SDK
ATSS FP16 4.3.0
CenterNet FP16 4.3.0
DETR FP16 4.3.0
FCOS FP16 4.3.0
FoveaBox FP16 4.3.0
FSAF FP16 4.3.0
GFL FP16 4.3.0
HRNet FP16 4.3.0
PAA FP16 4.3.0
RetinaFace FP16 4.3.0
RetinaNet FP16 4.3.0
RTMDet FP16 4.3.0
RTDETR FP16 dev-only
INT8 dev-only
SABL FP16 4.3.0
SSD FP16 4.3.0
YOLOF FP16 4.3.0
YOLOv3 FP16 4.3.0
INT8 4.3.0
YOLOv4 FP16 4.3.0
INT8 4.3.0
YOLOv5m FP16 4.3.0
INT8 4.3.0
YOLOv5s FP16 4.3.0
INT8 4.3.0
YOLOv6s FP16 4.3.0
INT8 4.3.0
YOLOv7 FP16 4.3.0
INT8 4.3.0
YOLOv8n FP16 4.3.0
INT8 4.3.0
YOLOv8s FP16 4.3.0
INT8 4.3.0
YOLOv9s FP16 4.3.0
INT8 4.3.0
YOLOv10s FP16 4.3.0
YOLOv11n FP16 4.3.0
INT8 4.3.0
YOLOv12n FP16 4.3.0
INT8 4.3.0
YOLOv13n FP16 4.3.0
INT8 4.3.0
YOLOXm FP16 4.3.0
INT8 4.3.0
Model Prec. PaddlePaddle IXUCA SDK
RTDETR FP16 dev-only
Model Prec. Pytorch IXUCA SDK
YOLOv8n FP16 dev-only

人脸识别

Model Prec. IGIE ixRT IXUCA SDK
FaceNet FP16 4.3.0
INT8 4.3.0

光学字符识别(OCR)

Model Prec. IGIE IXUCA SDK
Kie_layoutXLM FP16 4.3.0
SVTR FP16 4.3.0

姿态估计

Model Prec. IGIE ixRT IXUCA SDK
HRNetPose FP16 4.3.0
Lightweight OpenPose FP16 4.3.0
RTMPose FP16 4.3.0

实例分割

Model Prec. IGIE ixRT IXUCA SDK
Mask R-CNN FP16 4.2.0
SOLOv1 FP16 4.3.0

语义分割

Model Prec. IGIE ixRT IXUCA SDK
UNet FP16 4.3.0

多目标跟踪

Model Prec. IGIE ixRT IXUCA SDK
FastReID FP16 4.3.0
DeepSort FP16 4.3.0
INT8 4.3.0
RepNet-Vehicle-ReID FP16 4.3.0

多模态

Model Engine Supported IXUCA SDK
Aria vLLM 4.3.0
Chameleon-7B vLLM 4.3.0
CLIP IxFormer 4.3.0
Fuyu-8B vLLM 4.3.0
H2OVL Mississippi vLLM 4.3.0
Idefics3 vLLM 4.3.0
InternVL2-4B vLLM 4.3.0
LLaVA vLLM 4.3.0
LLaVA-Next-Video-7B vLLM 4.3.0
Llama-3.2 vLLM 4.3.0
Pixtral vLLM 4.3.0
Stable Diffusion 1.5 Diffusers 4.3.0
Stable Diffusion 3 Diffusers dev-only

自然语言处理(NLP)

预训练语言模型(PLM)

Model Prec. IGIE ixRT IXUCA SDK
ALBERT FP16 4.3.0
BERT Base NER INT8 4.3.0
BERT Base SQuAD FP16 4.3.0
INT8 4.3.0
BERT Large SQuAD FP16 4.3.0
INT8 4.3.0
DeBERTa FP16 4.3.0
RoBERTa FP16 4.3.0
RoFormer FP16 4.3.0
VideoBERT FP16 4.2.0

语音

语音识别

Model Prec. IGIE ixRT IXUCA SDK
Conformer FP16 4.3.0
Transformer ASR FP16 4.2.0

其他

推荐系统

Model Prec. IGIE ixRT IXUCA SDK
Wide & Deep FP16 4.3.0

容器

Docker Installer IXUCA SDK Introduction
corex-docker-installer-4.3.0-*-py3.10-x86_64.run 4.3.0 适用小模型推理
corex-docker-installer-4.3.0-*-llm-py3.10-x86_64.run 4.3.0 适用大模型推理

社区

治理

请参见 DeepSpark Code of Conduct on Gitee or on GitHub

交流

请联系 contact@deepspark.org.cn

贡献

请参见 DeepSparkInference Contributing Guidelines

免责声明

DeepSparkInference 仅提供公共数据集的下载和预处理脚本。这些数据集不属于 DeepSparkInference,DeepSparkInference 也不对其质量或维护负责。请确保您具有这些数据集的使用许可,基于这些数据集训练的模型仅可用于非商业研究和教育。

致数据集所有者:

如果不希望您的数据集公布在 DeepSparkInference 上或希望更新 DeepSparkInference 中属于您的数据集,请在 Gitee 或 Github 上提交 issue,我们将按您的 issue 删除或更新。衷心感谢您对我们社区的支持和贡献。

许可证

本项目许可证遵循 Apache-2.0

About

DeepSparkInference has selected 216 inference models of both small and large sizes. The small models cover fields such as computer vision, natural language processing, and speech recognition; the LLMs involve various frameworks including vLLM, TGI and LMDeploy. This repository is the mirror of Gitee.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors