Skip to content

CSCfi/ai-inference-examples

Repository files navigation

AI inference example scripts for supercomputers

Scripts to run DeepSeek-R1 (distilled versions, Qwen-32B or Llama-70B) vLLM with a full node on Puhti, Mahti or LUMI:

These scripts start the vLLM server in OpenAI-compatible API mode, runs a query (that you can replace with something more substantial), and then quits. You can modify the code to also keep it running for the duration of the job.

There's also a script to run vLLM with ray for two full nodes on LUMI: run-vllm-ray.sh

Scripts to run the same with Ollama:

Note: all script are Slurm batch job scripts and need to be submitted with sbatch, for example:

sbatch run-vllm-lumi8.sh

TODO

  • the Ollama scripts don't seem to use all GPUs, probably scripts are reserving too much
  • run-vllm-ray.sh should use high-speed net and not need NCCL_NET=Socket

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages