FileNotFoundError(path): R0 checkpoint_and_raise - error=wiki_data

Hi,
I encountered an issue when I tried to train locally by running this command:
`CUDA_VISIBLE_DEVICES=0,1 torchrun --standalone --nnodes=1 --nproc-per-node=2 \
    -m lcm.train launcher=standalone \
    +pretrain=mse \
    ++trainer.data_loading_config.max_tokens=1000 \
    ++trainer.output_dir="checkpoints/mse_lcm" \
    +trainer.use_submitit=false`

This is the error:
<img width="1145" alt="Image" src="https://github.com/user-attachments/assets/1b6c6b4c-b44a-4915-a1da-29c0bdb96914" />

Could you please help me find out the reason?
Do I need to add a local parquet_path to the datacard? and how? (I do not have a SLURM cluster.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FileNotFoundError(path): R0 checkpoint_and_raise - error=wiki_data #23

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

FileNotFoundError(path): R0 checkpoint_and_raise - error=wiki_data #23

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions