Skip to content

FileNotFoundError(path): R0 checkpoint_and_raise - error=wiki_data #23

@SarahGaga0822

Description

@SarahGaga0822

Hi,
I encountered an issue when I tried to train locally by running this command:
CUDA_VISIBLE_DEVICES=0,1 torchrun --standalone --nnodes=1 --nproc-per-node=2 \ -m lcm.train launcher=standalone \ +pretrain=mse \ ++trainer.data_loading_config.max_tokens=1000 \ ++trainer.output_dir="checkpoints/mse_lcm" \ +trainer.use_submitit=false

This is the error:
Image

Could you please help me find out the reason?
Do I need to add a local parquet_path to the datacard? and how? (I do not have a SLURM cluster.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions