Skip to content

Commit

Permalink
Update rp effective seq len sampling
Browse files Browse the repository at this point in the history
  • Loading branch information
mzio committed Sep 19, 2024
1 parent 9df4a12 commit fd7be7f
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions src/dataloaders/redpajama_sample_contig.py
Original file line number Diff line number Diff line change
Expand Up @@ -115,8 +115,8 @@ def load_data(name: str, dataset_config: dict, pretrained_model_config: dict,
_data_attr = '-d='.join(_data_attr).replace('/', '_').replace('.json', '')
_data_attr = _data_attr.replace('[','_').replace(']','')

fname = f'd={_data_attr}-nts={num_train_samples}-mts={max_train_samples}-dcs={chunk_size}-max={max_length}-min={min_length}-s={seed}'
# fname = f'd={_data_attr}-mts={max_train_samples}-dcs={chunk_size}-max={max_length}-min={min_length}-s={seed}'
# fname = f'd={_data_attr}-nts={num_train_samples}-mts={max_train_samples}-dcs={chunk_size}-max={max_length}-min={min_length}-s={seed}'
fname = f'd={_data_attr}-mts={max_train_samples}-dcs={chunk_size}-max={max_length}-min={min_length}-s={seed}'
fname = join(dataset_config['dataloaders_dir'], fname)


Expand Down

0 comments on commit fd7be7f

Please sign in to comment.