Skip to content

Fix check_model: LLM_MODELS is a list of str, not Enum (#317)#321

Merged
idevasena merged 1 commit intomlcommons:mainfrom
idevasena:pr_issue317_fix
Apr 9, 2026
Merged

Fix check_model: LLM_MODELS is a list of str, not Enum (#317)#321
idevasena merged 1 commit intomlcommons:mainfrom
idevasena:pr_issue317_fix

Conversation

@idevasena
Copy link
Copy Markdown
Contributor

Problem

Running mlpstorage checkpointing datasize with any valid model (e.g. --model llama3-8b) fails with:

Error running check check_model: 'str' object has no attribute 'value'

Root Cause

In mlpstorage_py/rules/run_checkers/checkpointing.py, line 34 treats LLM_MODELS entries as Enum members:

valid_models = [m.value for m in self.supported_models]

However, LLM_MODELS (defined in config.py) is a plain list of strings, not an Enum. Calling .value on a str raises an AttributeError.

Fix

Replace the list comprehension with a direct reference since the values are already strings:

valid_models = list(self.supported_models)

Testing

  • Verified mlpstorage checkpointing datasize --model llama3-8b ... no longer crashes.
:~/Storage_Repo_Tests/storage_prfix317$ ./mlpstorage checkpointing datasize --client-host-memory-in-gb 256 --model "llama3-8b" --num-processes 8 --checkpoint-folder ~/checkpoint_temp --file --allow-run-as-root
Hosts is: ['127.0.0.1']
Hosts is: ['127.0.0.1']
⠋ Validating environment... 0:00:002026-04-07 21:51:06|INFO: Environment validation passed
2026-04-07 21:51:06|STATUS: Benchmark results directory: /tmp/mlperf_storage_results/checkpointing/llama3-8b/20260407_215106
2026-04-07 21:51:06|INFO: Created benchmark run: checkpointing_datasize_llama3-8b_20260407_215106
2026-04-07 21:51:06|STATUS: Verifying benchmark run for checkpointing_datasize_llama3-8b_20260407_215106
2026-04-07 21:51:06|STATUS: Benchmark run qualifies for CLOSED category ([RunID(program='checkpointing', command='datasize', model='llama3-8b', run_datetime='20260407_215106')])
2026-04-07 21:51:06|WARNING: Running the benchmark without verification for open or closed configurations. These results are not valid for submission. Use --open or --closed to specify a configuration.
2026-04-07 21:51:06|STATUS: Instantiated the Checkpointing Benchmark...
⠦ Collecting cluster info... ━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━ 2/4 0:00:002026-04-07 21:51:07|RESULT: Total GiB required per rank:
	Rank 0: 13.12GiB
	Rank 1: 13.12GiB
	Rank 2: 13.12GiB
	Rank 3: 13.12GiB
	Rank 4: 13.12GiB
	Rank 5: 13.12GiB
	Rank 6: 13.12GiB
	Rank 7: 13.12GiB
2026-04-07 21:51:07|RESULT: Total GiB required for all ranks: 105.00GiB
2026-04-07 21:51:08|STATUS: Writing metadata for benchmark to: /tmp/mlperf_storage_results/checkpointing/llama3-8b/20260407_215106/checkpointing_20260407_215106_metadata.json
  • Verified that passing an invalid model name still produces the expected validation error.
:~/Storage_Repo_Tests/storage_prfix317$ ./mlpstorage checkpointing datasize \
     --client-host-memory-in-gb 256 \
     --model "bogus-model" \
     --num-processes 8 \
     --checkpoint-folder ~/checkpoint_temp \
     --file --allow-run-as-root
Invalid LLM model. Supported models are: llama3-70b, llama3-405b, llama3-1t, llama3-8b

@idevasena idevasena requested a review from a team April 7, 2026 21:58
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

Copy link
Copy Markdown
Contributor

@russfellows russfellows left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this has been tested, great.

@idevasena idevasena merged commit 62826ef into mlcommons:main Apr 9, 2026
2 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Apr 9, 2026
@idevasena idevasena deleted the pr_issue317_fix branch April 9, 2026 17:23
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants