Skip to content

Commit

Permalink
set up cross toolchains and qemu
Browse files Browse the repository at this point in the history
  • Loading branch information
AyiStar committed Mar 18, 2024
1 parent 9eee70d commit 1abf661
Show file tree
Hide file tree
Showing 470 changed files with 276,979 additions and 0 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
.vscode/
build/
43 changes: 43 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# LA-llama.cpp

Let's play LLM on LoongArch!


## Overview

The project aims at porting and optimizing llama.cpp, a C++ LLM inference framework, on LoongArch.
Especially, we want to tackle the following challenges:

* Potential problems when porting the code on LoongArch platform.
* Inference performance optimization via SIMD, temporarily targeting at 3A6000 platform.
* LLM evaluation on LoongArch platform.
* Interesting applications with presentation.

## Plan

Based on the above challenges, the project can be divided into the following 4 stages:

### Porting
- Task: Port llama.cpp to LoongArch platform.
- Objective: Compile and run llama.cpp on 3A6000.

### Optimization
- Task: Optimize the efficiency of llama.cpp on LoongArch (focus on CPU).
- Objective: Apply programming optimization techniques and document the improvements.

### Evaluation
- Task: Benchmark various LLMs of different sizes.
- Objective: Output a technical report.

### Application
- Task: Deploy usable applications with LLM on LoongArch platforms.
- Objective: Output well-written deployment documents and visual demos.

## Miscellaneous
- We develop based on release `b2430` of the [original repo](https://github.com/ggerganov/llama.cpp/releases/tag/b2430).

## Progress and TODOs
[x] Compile original llama.cpp on x86 CPU.
[ ] Run LLM on x86 CPU.
[x] Set up QEMU environment for LoongArch.
[x] Set up cross compilation tools for LoongArch on x86.
21 changes: 21 additions & 0 deletions docs/cross_tools_setup.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Set up the cross compilation and emulation tools for LoongArch


- Use the officially provided [build tools](https://github.com/loongson/build-tools).
- Download the binaries of [cross toolchain](https://github.com/loongson/build-tools/releases/download/2023.08.08/CLFS-loongarch64-8.1-x86_64-cross-tools-gcc-glibc.tar.xz) and [QEMU linux-user](https://github.com/loongson/build-tools/releases/download/2023.08.08/qemu-loongarch64).
- For convenience, set the root dir of build tools and qemu as $LA_TOOLCHAIN and $LA_QEMU, respective. Add $LA_TOOLCHAIN/bin and $LA_QEMU/bin to $PATH.


## Basic Testing

Test with C
```bash
loongarch64-unknown-linux-gnu-gcc hello_loongarch.c -o hello_loongarch
qemu-loongarch64 -L $LA_TOOLCHAIN/target/ -E LD_LIBRARY_PATH=$LA_TOOLCHAIN/target/lib64/:LD_LIBRARY_PATH hello_loongarch
```

Test with C++
```bash
loongarch64-unknown-linux-gnu-g++ hello_loongarch.cpp -o hello_loongarch
qemu-loongarch64 -L $LA_TOOLCHAIN/target/ -E LD_LIBRARY_PATH=$LA_TOOLCHAIN/loongarch64-unknown-linux-gnu/lib/:LD_LIBRARY_PATH hello_loongarch
```
22 changes: 22 additions & 0 deletions la-llama.cpp/.devops/cloud-v-pipeline
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
node('x86_runner1'){ // Running on x86 runner containing latest vector qemu, latest vector gcc and all the necessary libraries
stage('Cleanup'){
cleanWs() // Cleaning previous CI build in workspace
}
stage('checkout repo'){
retry(5){ // Retry if the cloning fails due to some reason
checkout scm // Clone the repo on Runner
}
}
stage('Compiling llama.cpp'){
sh'''#!/bin/bash
make RISCV=1 RISCV_CROSS_COMPILE=1 # Compiling llama for RISC-V
'''
}
stage('Running llama.cpp'){
sh'''#!/bin/bash
module load gnu-bin2/0.1 # loading latest versions of vector qemu and vector gcc
qemu-riscv64 -L /softwares/gnu-bin2/sysroot -cpu rv64,v=true,vlen=256,elen=64,vext_spec=v1.0 ./main -m /home/alitariq/codellama-7b.Q4_K_M.gguf -p "Anything" -n 9 > llama_log.txt # Running llama.cpp on vector qemu-riscv64
cat llama_log.txt # Printing results
'''
}
}
34 changes: 34 additions & 0 deletions la-llama.cpp/.devops/full-cuda.Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
ARG UBUNTU_VERSION=22.04

# This needs to generally match the container host's environment.
ARG CUDA_VERSION=11.7.1

# Target the CUDA build image
ARG BASE_CUDA_DEV_CONTAINER=nvidia/cuda:${CUDA_VERSION}-devel-ubuntu${UBUNTU_VERSION}

FROM ${BASE_CUDA_DEV_CONTAINER} as build

# Unless otherwise specified, we make a fat build.
ARG CUDA_DOCKER_ARCH=all

RUN apt-get update && \
apt-get install -y build-essential python3 python3-pip git

COPY requirements.txt requirements.txt
COPY requirements requirements

RUN pip install --upgrade pip setuptools wheel \
&& pip install -r requirements.txt

WORKDIR /app

COPY . .

# Set nvcc architecture
ENV CUDA_DOCKER_ARCH=${CUDA_DOCKER_ARCH}
# Enable cuBLAS
ENV LLAMA_CUBLAS=1

RUN make

ENTRYPOINT ["/app/.devops/tools.sh"]
45 changes: 45 additions & 0 deletions la-llama.cpp/.devops/full-rocm.Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
ARG UBUNTU_VERSION=22.04

# This needs to generally match the container host's environment.
ARG ROCM_VERSION=5.6

# Target the CUDA build image
ARG BASE_ROCM_DEV_CONTAINER=rocm/dev-ubuntu-${UBUNTU_VERSION}:${ROCM_VERSION}-complete

FROM ${BASE_ROCM_DEV_CONTAINER} as build

# Unless otherwise specified, we make a fat build.
# List from https://github.com/ggerganov/llama.cpp/pull/1087#issuecomment-1682807878
# This is mostly tied to rocBLAS supported archs.
ARG ROCM_DOCKER_ARCH=\
gfx803 \
gfx900 \
gfx906 \
gfx908 \
gfx90a \
gfx1010 \
gfx1030 \
gfx1100 \
gfx1101 \
gfx1102

COPY requirements.txt requirements.txt
COPY requirements requirements

RUN pip install --upgrade pip setuptools wheel \
&& pip install -r requirements.txt

WORKDIR /app

COPY . .

# Set nvcc architecture
ENV GPU_TARGETS=${ROCM_DOCKER_ARCH}
# Enable ROCm
ENV LLAMA_HIPBLAS=1
ENV CC=/opt/rocm/llvm/bin/clang
ENV CXX=/opt/rocm/llvm/bin/clang++

RUN make

ENTRYPOINT ["/app/.devops/tools.sh"]
22 changes: 22 additions & 0 deletions la-llama.cpp/.devops/full.Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
ARG UBUNTU_VERSION=22.04

FROM ubuntu:$UBUNTU_VERSION as build

RUN apt-get update && \
apt-get install -y build-essential python3 python3-pip git

COPY requirements.txt requirements.txt
COPY requirements requirements

RUN pip install --upgrade pip setuptools wheel \
&& pip install -r requirements.txt

WORKDIR /app

COPY . .

RUN make

ENV LC_ALL=C.utf8

ENTRYPOINT ["/app/.devops/tools.sh"]
84 changes: 84 additions & 0 deletions la-llama.cpp/.devops/llama-cpp-clblast.srpm.spec
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# SRPM for building from source and packaging an RPM for RPM-based distros.
# https://fedoraproject.org/wiki/How_to_create_an_RPM_package
# Built and maintained by John Boero - boeroboy@gmail.com
# In honor of Seth Vidal https://www.redhat.com/it/blog/thank-you-seth-vidal

# Notes for llama.cpp:
# 1. Tags are currently based on hash - which will not sort asciibetically.
# We need to declare standard versioning if people want to sort latest releases.
# 2. Builds for CUDA/OpenCL support are separate, with different depenedencies.
# 3. NVidia's developer repo must be enabled with nvcc, cublas, clblas, etc installed.
# Example: https://developer.download.nvidia.com/compute/cuda/repos/fedora37/x86_64/cuda-fedora37.repo
# 4. OpenCL/CLBLAST support simply requires the ICD loader and basic opencl libraries.
# It is up to the user to install the correct vendor-specific support.

Name: llama.cpp-clblast
Version: %( date "+%%Y%%m%%d" )
Release: 1%{?dist}
Summary: OpenCL Inference of LLaMA model in C/C++
License: MIT
Source0: https://github.com/ggerganov/llama.cpp/archive/refs/heads/master.tar.gz
BuildRequires: coreutils make gcc-c++ git mesa-libOpenCL-devel clblast-devel
Requires: clblast
URL: https://github.com/ggerganov/llama.cpp

%define debug_package %{nil}
%define source_date_epoch_from_changelog 0

%description
CPU inference for Meta's Lllama2 models using default options.

%prep
%setup -n llama.cpp-master

%build
make -j LLAMA_CLBLAST=1

%install
mkdir -p %{buildroot}%{_bindir}/
cp -p main %{buildroot}%{_bindir}/llamaclblast
cp -p server %{buildroot}%{_bindir}/llamaclblastserver
cp -p simple %{buildroot}%{_bindir}/llamaclblastsimple

mkdir -p %{buildroot}/usr/lib/systemd/system
%{__cat} <<EOF > %{buildroot}/usr/lib/systemd/system/llamaclblast.service
[Unit]
Description=Llama.cpp server, CPU only (no GPU support in this build).
After=syslog.target network.target local-fs.target remote-fs.target nss-lookup.target

[Service]
Type=simple
EnvironmentFile=/etc/sysconfig/llama
ExecStart=/usr/bin/llamaclblastserver $LLAMA_ARGS
ExecReload=/bin/kill -s HUP $MAINPID
Restart=never

[Install]
WantedBy=default.target
EOF

mkdir -p %{buildroot}/etc/sysconfig
%{__cat} <<EOF > %{buildroot}/etc/sysconfig/llama
LLAMA_ARGS="-m /opt/llama2/ggml-model-f32.bin"
EOF

%clean
rm -rf %{buildroot}
rm -rf %{_builddir}/*

%files
%{_bindir}/llamaclblast
%{_bindir}/llamaclblastserver
%{_bindir}/llamaclblastsimple
/usr/lib/systemd/system/llamaclblast.service
%config /etc/sysconfig/llama


%pre

%post

%preun
%postun

%changelog
83 changes: 83 additions & 0 deletions la-llama.cpp/.devops/llama-cpp-cublas.srpm.spec
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
# SRPM for building from source and packaging an RPM for RPM-based distros.
# https://fedoraproject.org/wiki/How_to_create_an_RPM_package
# Built and maintained by John Boero - boeroboy@gmail.com
# In honor of Seth Vidal https://www.redhat.com/it/blog/thank-you-seth-vidal

# Notes for llama.cpp:
# 1. Tags are currently based on hash - which will not sort asciibetically.
# We need to declare standard versioning if people want to sort latest releases.
# 2. Builds for CUDA/OpenCL support are separate, with different depenedencies.
# 3. NVidia's developer repo must be enabled with nvcc, cublas, clblas, etc installed.
# Example: https://developer.download.nvidia.com/compute/cuda/repos/fedora37/x86_64/cuda-fedora37.repo
# 4. OpenCL/CLBLAST support simply requires the ICD loader and basic opencl libraries.
# It is up to the user to install the correct vendor-specific support.

Name: llama.cpp-cublas
Version: %( date "+%%Y%%m%%d" )
Release: 1%{?dist}
Summary: CPU Inference of LLaMA model in pure C/C++ (no CUDA/OpenCL)
License: MIT
Source0: https://github.com/ggerganov/llama.cpp/archive/refs/heads/master.tar.gz
BuildRequires: coreutils make gcc-c++ git cuda-toolkit
Requires: cuda-toolkit
URL: https://github.com/ggerganov/llama.cpp

%define debug_package %{nil}
%define source_date_epoch_from_changelog 0

%description
CPU inference for Meta's Lllama2 models using default options.

%prep
%setup -n llama.cpp-master

%build
make -j LLAMA_CUBLAS=1

%install
mkdir -p %{buildroot}%{_bindir}/
cp -p main %{buildroot}%{_bindir}/llamacppcublas
cp -p server %{buildroot}%{_bindir}/llamacppcublasserver
cp -p simple %{buildroot}%{_bindir}/llamacppcublassimple

mkdir -p %{buildroot}/usr/lib/systemd/system
%{__cat} <<EOF > %{buildroot}/usr/lib/systemd/system/llamacublas.service
[Unit]
Description=Llama.cpp server, CPU only (no GPU support in this build).
After=syslog.target network.target local-fs.target remote-fs.target nss-lookup.target

[Service]
Type=simple
EnvironmentFile=/etc/sysconfig/llama
ExecStart=/usr/bin/llamacppcublasserver $LLAMA_ARGS
ExecReload=/bin/kill -s HUP $MAINPID
Restart=never

[Install]
WantedBy=default.target
EOF

mkdir -p %{buildroot}/etc/sysconfig
%{__cat} <<EOF > %{buildroot}/etc/sysconfig/llama
LLAMA_ARGS="-m /opt/llama2/ggml-model-f32.bin"
EOF

%clean
rm -rf %{buildroot}
rm -rf %{_builddir}/*

%files
%{_bindir}/llamacppcublas
%{_bindir}/llamacppcublasserver
%{_bindir}/llamacppcublassimple
/usr/lib/systemd/system/llamacublas.service
%config /etc/sysconfig/llama

%pre

%post

%preun
%postun

%changelog
Loading

0 comments on commit 1abf661

Please sign in to comment.