A skill for Claude Code and Codex that steers Python code generation toward the RAPIDS GPU stack — automatically choosing cupy, cudf, cuml, cugraph, cucim, cuvs, and friends over numpy, pandas, scikit-learn, scipy, networkx, scikit-image, and faiss.
This skill activates only when the user includes rapids-first in their request. Once active, every "writing Python code" subtask — rewriting existing CPU code, implementing a new algorithm, scaffolding an ETL / training / inference / batch script, adding tests, or fixing a bug — is handled with a RAPIDS-first mindset:
- Zero-code acceleration first: prefer
cudf.pandas/cuml.accel/nx-cugraphbackend /dask-cudfbackend so existing CPU code runs on GPU without any import changes. - Explicit RAPIDS imports when zero-code is unsuitable:
numpy → cupy,scipy.ndimage → cupyx.scipy.ndimage,sklearn.cluster → cuml.cluster,faiss → cuvs.neighbors, etc. - Precise API targeting: each RAPIDS package ships a flat-file API index under
apis/<pkg>.txt, so a singleripgrepcall recovers the exact signature and one-line docstring — no guessing from memory (cuML vs. sklearn frequently differ on parameter order, defaults, andoutput_type). - GPU data lifecycle discipline: minimize
.to_pandas()/.get()round-trips, avoidcupy ↔ numpymixing, and prefercudf.read_*so I/O lands directly on GPU.
See SKILL.md for the full workflow, API search recipes, and "what not to do" guardrails.
Drop the literal token rapids-first into your prompt and the skill takes over the rewrite.
Prompt:
Rewrite this pandas preprocessing script with
rapids-first.
Before — 0.87 s (mean of 5 runs, after a warmup pass):
import pandas as pd
df = pd.read_parquet("events.parquet")
df["x"] = df["a"] / df["b"]
out = df.groupby("user_id")["x"].mean()After — 0.05 s (mean of 5 runs, after a warmup pass) — ~17× faster:
import cudf
df = cudf.read_parquet("events.parquet")
df["x"] = df["a"] / df["b"]
out = df.groupby("user_id")["x"].mean()Three things to notice:
- The only change is the import.
cudfis a drop-in forpandashere — the sameread_parquet/ arithmetic /groupby().mean()chain runs end-to-end on the GPU. - No mid-stream device round-trips. Per the skill's GPU-data-lifecycle rule, the result stays in GPU memory; we don't call
.to_pandas()until the consumer actually needs CPU data. - Zero-code alternative. For an existing script you'd rather not touch at all, run it as
python -m cudf.pandas script.py— the skill picks this path automatically when the user wants "make existing code run faster" instead of "show me explicit RAPIDS imports".
Reproduce locally (10M rows, ~2M unique user_id, RTX 4060 Ti, cudf 26.02.01, pandas 2.3.3):
# gen_events.py — generate ~130 MB events.parquet in $TMPDIR
import os, numpy as np, pandas as pd
rng = np.random.default_rng(20260514)
n_rows, n_users = 10_000_000, 2_000_000
pd.DataFrame({
"user_id": rng.integers(0, n_users, size=n_rows, dtype=np.int32),
"a": rng.normal(10.0, 3.0, n_rows).astype(np.float32),
"b": rng.uniform(0.5, 5.0, n_rows).astype(np.float32),
}).to_parquet(f"{os.environ['TMPDIR']}/events.parquet", index=False)- RAPIDS — the GPU data-science stack this skill steers toward (
cupy,cudf,cuml,cugraph,cucim,cuvs,nx-cugraph,dask-cudf,cuxfilter,pylibraft,raft-dask). A CUDA-capable NVIDIA GPU and matching CUDA driver are required for the underlying libraries to import. - ripgrep (
rg) — used throughout the skill's standard workflow to recover exact API signatures fromapis/<pkg>.txtin a single grep.
The actual skill payload lives at skills/rapids-first/ inside this repository.
If you have the GitHub CLI with the skill subcommand available, it handles the skills/<name>/ subdirectory automatically:
# Claude Code
gh skill install TioSisai/rapids-first rapids-first --agent claude-code --scope user
# Codex
gh skill install TioSisai/rapids-first rapids-first --agent codex --scope userUser-level (available across all projects):
git clone https://github.com/TioSisai/rapids-first.git
mv rapids-first/skills/rapids-first ~/.claude/skills/rapids-firstProject-level (scoped to the current project):
git clone https://github.com/TioSisai/rapids-first.git
mv rapids-first/skills/rapids-first .claude/skills/rapids-firstUser-level:
git clone https://github.com/TioSisai/rapids-first.git
mv rapids-first/skills/rapids-first ~/.agents/skills/rapids-firstProject-level:
git clone https://github.com/TioSisai/rapids-first.git
mv rapids-first/skills/rapids-first .agents/skills/rapids-firstAfter installation, restart the CLI (or start a new session). Trigger the skill by including the literal token rapids-first in any prompt, for example:
Rewrite this preprocessing script
rapids-first.
Implement KMeans on this DataFrame with
rapids-first.
The pre-built apis/<pkg>.txt files in this repository were generated against the following environment:
| Component | Version |
|---|---|
| Python | 3.12.13 |
| CUDA runtime | 12.9 |
| cupy | 14.0.1 |
| cupyx | (ships with cupy) |
| cudf | 26.02.01 |
| dask_cudf | 26.02.01 |
| cuml | 26.02.000 |
| cugraph | 26.02.00 |
| nx_cugraph | 26.02.000 |
| cuxfilter | 26.02.000 |
| cucim | 26.02.01 |
| pylibraft | 26.02.00 |
| raft_dask | 26.02.00 |
| cuvs | 26.02.000 |
Regenerate the API index whenever:
- You upgrade any RAPIDS package in your local environment;
- A new RAPIDS minor / major release changes public APIs (RAPIDS ships on a
YY.MMcadence); - Your local environment runs a CUDA / RAPIDS major version different from the table above and you observe signature drift.
If the table above already matches your environment, the shipped apis/ is usable as-is — no regeneration needed.
# 1. Activate the Python environment that has RAPIDS installed
conda activate <your-rapids-env> # conda
# source /path/to/venv/bin/activate # venv
# 2. Run the script via its absolute path (Claude Code, user-level install)
python ~/.claude/skills/rapids-first/fetch_rapids_apis.py
# Project-level install: python /path/to/your/project/.claude/skills/rapids-first/fetch_rapids_apis.py
# For Codex: replace `.claude` with `.agents`
# Regenerate specific packages only
python ~/.claude/skills/rapids-first/fetch_rapids_apis.py --packages cudf cuml
# Write to a different output root
python ~/.claude/skills/rapids-first/fetch_rapids_apis.py --output-dir /tmp/rapids-apisEach run rewrites apis/<pkg>.txt deterministically (entries sorted by qualname), so a git diff immediately surfaces API-surface changes — a quick way to audit RAPIDS upgrades.
If a package fails to import (missing CUDA driver, library skew, etc.), the script logs [skip] <pkg>: import failed -> <reason> to stderr and continues with the rest; partial regeneration is safe.