-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
470 changed files
with
276,979 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
.vscode/ | ||
build/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
# LA-llama.cpp | ||
|
||
Let's play LLM on LoongArch! | ||
|
||
|
||
## Overview | ||
|
||
The project aims at porting and optimizing llama.cpp, a C++ LLM inference framework, on LoongArch. | ||
Especially, we want to tackle the following challenges: | ||
|
||
* Potential problems when porting the code on LoongArch platform. | ||
* Inference performance optimization via SIMD, temporarily targeting at 3A6000 platform. | ||
* LLM evaluation on LoongArch platform. | ||
* Interesting applications with presentation. | ||
|
||
## Plan | ||
|
||
Based on the above challenges, the project can be divided into the following 4 stages: | ||
|
||
### Porting | ||
- Task: Port llama.cpp to LoongArch platform. | ||
- Objective: Compile and run llama.cpp on 3A6000. | ||
|
||
### Optimization | ||
- Task: Optimize the efficiency of llama.cpp on LoongArch (focus on CPU). | ||
- Objective: Apply programming optimization techniques and document the improvements. | ||
|
||
### Evaluation | ||
- Task: Benchmark various LLMs of different sizes. | ||
- Objective: Output a technical report. | ||
|
||
### Application | ||
- Task: Deploy usable applications with LLM on LoongArch platforms. | ||
- Objective: Output well-written deployment documents and visual demos. | ||
|
||
## Miscellaneous | ||
- We develop based on release `b2430` of the [original repo](https://github.com/ggerganov/llama.cpp/releases/tag/b2430). | ||
|
||
## Progress and TODOs | ||
[x] Compile original llama.cpp on x86 CPU. | ||
[ ] Run LLM on x86 CPU. | ||
[x] Set up QEMU environment for LoongArch. | ||
[x] Set up cross compilation tools for LoongArch on x86. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
# Set up the cross compilation and emulation tools for LoongArch | ||
|
||
|
||
- Use the officially provided [build tools](https://github.com/loongson/build-tools). | ||
- Download the binaries of [cross toolchain](https://github.com/loongson/build-tools/releases/download/2023.08.08/CLFS-loongarch64-8.1-x86_64-cross-tools-gcc-glibc.tar.xz) and [QEMU linux-user](https://github.com/loongson/build-tools/releases/download/2023.08.08/qemu-loongarch64). | ||
- For convenience, set the root dir of build tools and qemu as $LA_TOOLCHAIN and $LA_QEMU, respective. Add $LA_TOOLCHAIN/bin and $LA_QEMU/bin to $PATH. | ||
|
||
|
||
## Basic Testing | ||
|
||
Test with C | ||
```bash | ||
loongarch64-unknown-linux-gnu-gcc hello_loongarch.c -o hello_loongarch | ||
qemu-loongarch64 -L $LA_TOOLCHAIN/target/ -E LD_LIBRARY_PATH=$LA_TOOLCHAIN/target/lib64/:LD_LIBRARY_PATH hello_loongarch | ||
``` | ||
|
||
Test with C++ | ||
```bash | ||
loongarch64-unknown-linux-gnu-g++ hello_loongarch.cpp -o hello_loongarch | ||
qemu-loongarch64 -L $LA_TOOLCHAIN/target/ -E LD_LIBRARY_PATH=$LA_TOOLCHAIN/loongarch64-unknown-linux-gnu/lib/:LD_LIBRARY_PATH hello_loongarch | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
node('x86_runner1'){ // Running on x86 runner containing latest vector qemu, latest vector gcc and all the necessary libraries | ||
stage('Cleanup'){ | ||
cleanWs() // Cleaning previous CI build in workspace | ||
} | ||
stage('checkout repo'){ | ||
retry(5){ // Retry if the cloning fails due to some reason | ||
checkout scm // Clone the repo on Runner | ||
} | ||
} | ||
stage('Compiling llama.cpp'){ | ||
sh'''#!/bin/bash | ||
make RISCV=1 RISCV_CROSS_COMPILE=1 # Compiling llama for RISC-V | ||
''' | ||
} | ||
stage('Running llama.cpp'){ | ||
sh'''#!/bin/bash | ||
module load gnu-bin2/0.1 # loading latest versions of vector qemu and vector gcc | ||
qemu-riscv64 -L /softwares/gnu-bin2/sysroot -cpu rv64,v=true,vlen=256,elen=64,vext_spec=v1.0 ./main -m /home/alitariq/codellama-7b.Q4_K_M.gguf -p "Anything" -n 9 > llama_log.txt # Running llama.cpp on vector qemu-riscv64 | ||
cat llama_log.txt # Printing results | ||
''' | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
ARG UBUNTU_VERSION=22.04 | ||
|
||
# This needs to generally match the container host's environment. | ||
ARG CUDA_VERSION=11.7.1 | ||
|
||
# Target the CUDA build image | ||
ARG BASE_CUDA_DEV_CONTAINER=nvidia/cuda:${CUDA_VERSION}-devel-ubuntu${UBUNTU_VERSION} | ||
|
||
FROM ${BASE_CUDA_DEV_CONTAINER} as build | ||
|
||
# Unless otherwise specified, we make a fat build. | ||
ARG CUDA_DOCKER_ARCH=all | ||
|
||
RUN apt-get update && \ | ||
apt-get install -y build-essential python3 python3-pip git | ||
|
||
COPY requirements.txt requirements.txt | ||
COPY requirements requirements | ||
|
||
RUN pip install --upgrade pip setuptools wheel \ | ||
&& pip install -r requirements.txt | ||
|
||
WORKDIR /app | ||
|
||
COPY . . | ||
|
||
# Set nvcc architecture | ||
ENV CUDA_DOCKER_ARCH=${CUDA_DOCKER_ARCH} | ||
# Enable cuBLAS | ||
ENV LLAMA_CUBLAS=1 | ||
|
||
RUN make | ||
|
||
ENTRYPOINT ["/app/.devops/tools.sh"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
ARG UBUNTU_VERSION=22.04 | ||
|
||
# This needs to generally match the container host's environment. | ||
ARG ROCM_VERSION=5.6 | ||
|
||
# Target the CUDA build image | ||
ARG BASE_ROCM_DEV_CONTAINER=rocm/dev-ubuntu-${UBUNTU_VERSION}:${ROCM_VERSION}-complete | ||
|
||
FROM ${BASE_ROCM_DEV_CONTAINER} as build | ||
|
||
# Unless otherwise specified, we make a fat build. | ||
# List from https://github.com/ggerganov/llama.cpp/pull/1087#issuecomment-1682807878 | ||
# This is mostly tied to rocBLAS supported archs. | ||
ARG ROCM_DOCKER_ARCH=\ | ||
gfx803 \ | ||
gfx900 \ | ||
gfx906 \ | ||
gfx908 \ | ||
gfx90a \ | ||
gfx1010 \ | ||
gfx1030 \ | ||
gfx1100 \ | ||
gfx1101 \ | ||
gfx1102 | ||
|
||
COPY requirements.txt requirements.txt | ||
COPY requirements requirements | ||
|
||
RUN pip install --upgrade pip setuptools wheel \ | ||
&& pip install -r requirements.txt | ||
|
||
WORKDIR /app | ||
|
||
COPY . . | ||
|
||
# Set nvcc architecture | ||
ENV GPU_TARGETS=${ROCM_DOCKER_ARCH} | ||
# Enable ROCm | ||
ENV LLAMA_HIPBLAS=1 | ||
ENV CC=/opt/rocm/llvm/bin/clang | ||
ENV CXX=/opt/rocm/llvm/bin/clang++ | ||
|
||
RUN make | ||
|
||
ENTRYPOINT ["/app/.devops/tools.sh"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
ARG UBUNTU_VERSION=22.04 | ||
|
||
FROM ubuntu:$UBUNTU_VERSION as build | ||
|
||
RUN apt-get update && \ | ||
apt-get install -y build-essential python3 python3-pip git | ||
|
||
COPY requirements.txt requirements.txt | ||
COPY requirements requirements | ||
|
||
RUN pip install --upgrade pip setuptools wheel \ | ||
&& pip install -r requirements.txt | ||
|
||
WORKDIR /app | ||
|
||
COPY . . | ||
|
||
RUN make | ||
|
||
ENV LC_ALL=C.utf8 | ||
|
||
ENTRYPOINT ["/app/.devops/tools.sh"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,84 @@ | ||
# SRPM for building from source and packaging an RPM for RPM-based distros. | ||
# https://fedoraproject.org/wiki/How_to_create_an_RPM_package | ||
# Built and maintained by John Boero - boeroboy@gmail.com | ||
# In honor of Seth Vidal https://www.redhat.com/it/blog/thank-you-seth-vidal | ||
|
||
# Notes for llama.cpp: | ||
# 1. Tags are currently based on hash - which will not sort asciibetically. | ||
# We need to declare standard versioning if people want to sort latest releases. | ||
# 2. Builds for CUDA/OpenCL support are separate, with different depenedencies. | ||
# 3. NVidia's developer repo must be enabled with nvcc, cublas, clblas, etc installed. | ||
# Example: https://developer.download.nvidia.com/compute/cuda/repos/fedora37/x86_64/cuda-fedora37.repo | ||
# 4. OpenCL/CLBLAST support simply requires the ICD loader and basic opencl libraries. | ||
# It is up to the user to install the correct vendor-specific support. | ||
|
||
Name: llama.cpp-clblast | ||
Version: %( date "+%%Y%%m%%d" ) | ||
Release: 1%{?dist} | ||
Summary: OpenCL Inference of LLaMA model in C/C++ | ||
License: MIT | ||
Source0: https://github.com/ggerganov/llama.cpp/archive/refs/heads/master.tar.gz | ||
BuildRequires: coreutils make gcc-c++ git mesa-libOpenCL-devel clblast-devel | ||
Requires: clblast | ||
URL: https://github.com/ggerganov/llama.cpp | ||
|
||
%define debug_package %{nil} | ||
%define source_date_epoch_from_changelog 0 | ||
|
||
%description | ||
CPU inference for Meta's Lllama2 models using default options. | ||
|
||
%prep | ||
%setup -n llama.cpp-master | ||
|
||
%build | ||
make -j LLAMA_CLBLAST=1 | ||
|
||
%install | ||
mkdir -p %{buildroot}%{_bindir}/ | ||
cp -p main %{buildroot}%{_bindir}/llamaclblast | ||
cp -p server %{buildroot}%{_bindir}/llamaclblastserver | ||
cp -p simple %{buildroot}%{_bindir}/llamaclblastsimple | ||
|
||
mkdir -p %{buildroot}/usr/lib/systemd/system | ||
%{__cat} <<EOF > %{buildroot}/usr/lib/systemd/system/llamaclblast.service | ||
[Unit] | ||
Description=Llama.cpp server, CPU only (no GPU support in this build). | ||
After=syslog.target network.target local-fs.target remote-fs.target nss-lookup.target | ||
|
||
[Service] | ||
Type=simple | ||
EnvironmentFile=/etc/sysconfig/llama | ||
ExecStart=/usr/bin/llamaclblastserver $LLAMA_ARGS | ||
ExecReload=/bin/kill -s HUP $MAINPID | ||
Restart=never | ||
|
||
[Install] | ||
WantedBy=default.target | ||
EOF | ||
|
||
mkdir -p %{buildroot}/etc/sysconfig | ||
%{__cat} <<EOF > %{buildroot}/etc/sysconfig/llama | ||
LLAMA_ARGS="-m /opt/llama2/ggml-model-f32.bin" | ||
EOF | ||
|
||
%clean | ||
rm -rf %{buildroot} | ||
rm -rf %{_builddir}/* | ||
|
||
%files | ||
%{_bindir}/llamaclblast | ||
%{_bindir}/llamaclblastserver | ||
%{_bindir}/llamaclblastsimple | ||
/usr/lib/systemd/system/llamaclblast.service | ||
%config /etc/sysconfig/llama | ||
|
||
|
||
%pre | ||
|
||
%post | ||
|
||
%preun | ||
%postun | ||
|
||
%changelog |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,83 @@ | ||
# SRPM for building from source and packaging an RPM for RPM-based distros. | ||
# https://fedoraproject.org/wiki/How_to_create_an_RPM_package | ||
# Built and maintained by John Boero - boeroboy@gmail.com | ||
# In honor of Seth Vidal https://www.redhat.com/it/blog/thank-you-seth-vidal | ||
|
||
# Notes for llama.cpp: | ||
# 1. Tags are currently based on hash - which will not sort asciibetically. | ||
# We need to declare standard versioning if people want to sort latest releases. | ||
# 2. Builds for CUDA/OpenCL support are separate, with different depenedencies. | ||
# 3. NVidia's developer repo must be enabled with nvcc, cublas, clblas, etc installed. | ||
# Example: https://developer.download.nvidia.com/compute/cuda/repos/fedora37/x86_64/cuda-fedora37.repo | ||
# 4. OpenCL/CLBLAST support simply requires the ICD loader and basic opencl libraries. | ||
# It is up to the user to install the correct vendor-specific support. | ||
|
||
Name: llama.cpp-cublas | ||
Version: %( date "+%%Y%%m%%d" ) | ||
Release: 1%{?dist} | ||
Summary: CPU Inference of LLaMA model in pure C/C++ (no CUDA/OpenCL) | ||
License: MIT | ||
Source0: https://github.com/ggerganov/llama.cpp/archive/refs/heads/master.tar.gz | ||
BuildRequires: coreutils make gcc-c++ git cuda-toolkit | ||
Requires: cuda-toolkit | ||
URL: https://github.com/ggerganov/llama.cpp | ||
|
||
%define debug_package %{nil} | ||
%define source_date_epoch_from_changelog 0 | ||
|
||
%description | ||
CPU inference for Meta's Lllama2 models using default options. | ||
|
||
%prep | ||
%setup -n llama.cpp-master | ||
|
||
%build | ||
make -j LLAMA_CUBLAS=1 | ||
|
||
%install | ||
mkdir -p %{buildroot}%{_bindir}/ | ||
cp -p main %{buildroot}%{_bindir}/llamacppcublas | ||
cp -p server %{buildroot}%{_bindir}/llamacppcublasserver | ||
cp -p simple %{buildroot}%{_bindir}/llamacppcublassimple | ||
|
||
mkdir -p %{buildroot}/usr/lib/systemd/system | ||
%{__cat} <<EOF > %{buildroot}/usr/lib/systemd/system/llamacublas.service | ||
[Unit] | ||
Description=Llama.cpp server, CPU only (no GPU support in this build). | ||
After=syslog.target network.target local-fs.target remote-fs.target nss-lookup.target | ||
|
||
[Service] | ||
Type=simple | ||
EnvironmentFile=/etc/sysconfig/llama | ||
ExecStart=/usr/bin/llamacppcublasserver $LLAMA_ARGS | ||
ExecReload=/bin/kill -s HUP $MAINPID | ||
Restart=never | ||
|
||
[Install] | ||
WantedBy=default.target | ||
EOF | ||
|
||
mkdir -p %{buildroot}/etc/sysconfig | ||
%{__cat} <<EOF > %{buildroot}/etc/sysconfig/llama | ||
LLAMA_ARGS="-m /opt/llama2/ggml-model-f32.bin" | ||
EOF | ||
|
||
%clean | ||
rm -rf %{buildroot} | ||
rm -rf %{_builddir}/* | ||
|
||
%files | ||
%{_bindir}/llamacppcublas | ||
%{_bindir}/llamacppcublasserver | ||
%{_bindir}/llamacppcublassimple | ||
/usr/lib/systemd/system/llamacublas.service | ||
%config /etc/sysconfig/llama | ||
|
||
%pre | ||
|
||
%post | ||
|
||
%preun | ||
%postun | ||
|
||
%changelog |
Oops, something went wrong.