Add triton to kernel bench #18

PaliC · 2025-02-04T20:41:51Z

Adds triton support to kernel bench (not including CoT or multiturn).

The simple bit is just adding a triton prompt and support for switching from cuda to triton.

The more risky bit is evaluation. Because triton usually uses decorators like @triton.jit which are not supported in exec, now instead of taking the model from generated code using exec, we use a hacky solution of writing a temp file and importing directly from that file. Unfortunately, that temp file has to be deleted manually, but afaict (without just using the on disk source file for the generated code which we could do), there isn't really another way to cleanly run decorators outside of modifying the generated code.

Test Plan:

I ran the following commands and things seemed to work as expected:

python scripts/generate_samples.py run_name="test_hf_level_
1" dataset_src="huggingface" level="1" num_workers=50 server_type="deepseek" model_name="deepseek-coder" temperature=0 framework="cuda"

python scripts/eval_from_generations.py level=1 run_name="test_hf_level_1" dataset_src="local" level="1" num_gpu_devices=8 timeout=300

python scripts/generate_samples.py run_name="test_hf_level_
1_triton" dataset_src="huggingface" level="1" num_workers=50 server_type="deepseek" model_name="deepseek-coder" temperature=0 framework="triton"

scripts/eval_from_generations.py level=1 run_name="test_hf_level_1_triton" dataset_src="local" level="1" num_gpu_devices=8 timeout=300

python scripts/generate_and_eval_single_sample.py dataset_src="huggingface" level=2 problem_id=40

python scripts/generate_and_eval_single_sample.py dataset_src="huggingface" level=2 problem_id=40 framework="triton"

scripts/inspect_kernel_pytorch_profiler.py

scripts/generate_and_eval_single_sample.py

src/prompts/model_new_ex_add_triton.py

src/prompt_constructor.py

src/eval.py

src/prompt_constructor.py

README.md

Zacharias030 · 2025-02-05T22:20:00Z

Would it make sense to add the artefacts produced (ie, results) of one of the LLMs at least that can be obtained when executing via this PR to its description for reference?

For anything that we do with this new "KernelBench-Triton" variant, it might prove helpful to have some expected numbers to compare against in order to check correctness of this and subsequent implementations.

Appending a bunch of the generated prompt->response pairs of both success and failure cases may also help us convincing ourselves that everything makes sense as suggested in here. For example, if any particular model should obtain a 0% score, I think we should quickly rule out that a trivial issue is causing that.

simonguozirui · 2025-02-21T23:35:08Z

@PaliC once you have the format decided,
Let's try repro an end-to-end flow, where we take the KernelBench program -> LM -> PyTorch program with jit Triton (in the format of you specified). Let's also have a verification script like run_and_check to verify.

Another thing @msaroufim mentioned is we might need to filter out how many KernelBench problems are purely functional to satisfy the format.

Zacharias030 · 2025-03-07T05:49:27Z

src/prompt_constructor.py

+def prompt_fix_compile(ref_arch_src, custom_kernel, metadata):
    prompt = PROBLEM_STATEMENT


Thanks again for this PR!
Btw, this PROBLEM_STATEMENT and some other things are easily caught by a linter.

PaliC · 2025-03-18T23:20:42Z

As this PR is very stale and breaks KernelBench upstream quite a bit, please move discussion to #35 which does the same thing on the current iteration of kernelbench.

PaliC and others added 3 commits February 4, 2025 12:40

Add triton to kernel bench

75d28eb

add triton

ec273d7

add triton

4b38254

PaliC marked this pull request as ready for review February 4, 2025 21:33