-
CUDALibrarySamples Public
Forked from NVIDIA/CUDALibrarySamplesCUDA Library Samples
Cuda Other UpdatedNov 12, 2024 -
cutlass Public
Forked from NVIDIA/cutlassCUDA Templates for Linear Algebra Subroutines
C++ Other UpdatedNov 8, 2024 -
CUDA-Learn-Notes Public
Forked from DefTruth/CUDA-Learn-Notes🎉 Modern CUDA Learn Notes with PyTorch: CUDA Cores, Tensor Cores, fp32/tf32, fp16/bf16, fp8/int8, flash_attn, rope, sgemm, hgemm, sgemv, warp/block reduce, elementwise, softmax, layernorm, rmsnorm.
Cuda GNU General Public License v3.0 UpdatedNov 8, 2024 -
TensorRT-LLM Public
Forked from NVIDIA/TensorRT-LLMTensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…
C++ Apache License 2.0 UpdatedNov 6, 2024 -
Awesome-LLM-Inference Public
Forked from DefTruth/Awesome-LLM-Inference📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
GNU General Public License v3.0 UpdatedNov 1, 2024 -
PaddleCustomDevice Public
Forked from zhaify/PaddleCustomDevicePaddlePaddle custom device implementaion. (『飞桨』自定义硬件接入实现)
Python Apache License 2.0 UpdatedOct 15, 2024 -
-
-
Paddle Public
Forked from PaddlePaddle/PaddlePArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
C++ Apache License 2.0 UpdatedSep 25, 2024 -
xDiT Public
Forked from xdit-project/xDiTxDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) on multi-GPU Clusters
Python Apache License 2.0 UpdatedSep 24, 2024 -
sglang Public
Forked from sgl-project/sglangSGLang is a fast serving framework for large language models and vision language models.
Python Apache License 2.0 UpdatedSep 4, 2024 -
optimum-habana Public
Forked from huggingface/optimum-habanaEasy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)
Python Apache License 2.0 UpdatedAug 30, 2024 -
vidur Public
Forked from microsoft/vidurA large-scale simulation framework for LLM inference
Python MIT License UpdatedAug 24, 2024 -
deepseekv2-profile Public
Forked from madsys-dev/deepseekv2-profile -
AISystem Public
Forked from chenzomi12/AISystemAISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术
Jupyter Notebook Apache License 2.0 UpdatedAug 18, 2024 -
cuda-samples Public
Forked from NVIDIA/cuda-samplesSamples for CUDA Developers which demonstrates features in CUDA Toolkit
C Other UpdatedJul 25, 2024 -
-
vllm Public
Forked from vllm-project/vllmA high-throughput and memory-efficient inference and serving engine for LLMs
Python Apache License 2.0 UpdatedJul 12, 2024 -
Model-References Public
Forked from HabanaAI/Model-ReferencesTensorFlow and PyTorch Reference models for Gaudi(R)
-
Habana_Custom_Kernel Public
Forked from HabanaAI/Habana_Custom_KernelProvides the examples to write and build Habana custom kernels using the HabanaTools
C++ UpdatedMar 25, 2024 -
tgi-gaudi Public
Forked from huggingface/tgi-gaudiLarge Language Model Text Generation Inference on Habana Gaudi
Python Other UpdatedMar 1, 2024 -
DeepSpeed Public
Forked from HabanaAI/DeepSpeedDeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
-
ChatGLM-6B Public
Forked from THUDM/ChatGLM-6BChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
Python Apache License 2.0 UpdatedSep 14, 2023 -
ChatGLM2-6B Public
Forked from THUDM/ChatGLM2-6BChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型
Python Other UpdatedSep 14, 2023