only prepare data on rank 0 #409

conglongli · 2023-04-24T05:36:49Z

Also change the filename hashing from hash() to hashlib.sha256() because of https://stackoverflow.com/questions/27522626/hash-function-in-python-3-3-returns-different-results-between-sessions

Tested with step 1 training_scripts/multi_node/run_66b.sh that PR works fine.

puyuanOT · 2023-04-25T03:21:22Z

I got OOM issue after checking out to this PR. It looks like this data preparing approach is consuming much large cpu memory than the previous implementation.

conglongli · 2023-04-25T03:32:15Z

I got OOM issue after checking out to this PR. It looks like this data preparing approach is consuming much large cpu memory than the previous implementation.

@puyuanOT It's very unlikely that this PR leads to more CPU memory consumption, because if you look at the code change, the only thing this PR does is reducing the data preparation to a single rank on each node. If running the same script with and without this PR really leads to different result (OOM with this PR), please share your error log and script so that we could try to reproduce and debug. Thanks.

conglongli · 2023-04-25T03:43:14Z

@puyuanOT Actually there might be one thing related to potential memory consumption increase. Could you try changing the "return torch.load(train_fname), torch.load(eval_fname)" at

DeepSpeedExamples/applications/DeepSpeed-Chat/training/utils/data/data_utils.py

Line 327 in ab4e2e5

return torch.load(train_fname), torch.load(eval_fname)

to "train_dataset, eval_dataset = torch.load(train_fname), torch.load(eval_fname)" and "return train_dataset, eval_dataset"? For users with only 1 GPU, this difference might lead to more memory consumption.

puyuanOT · 2023-04-25T17:57:44Z

@conglongli, thank you for your response! I'm fairly certain that this PR has resulted in significantly higher memory consumption, as I manually tested the script with and without the PR. To provide more details, without the PR, it consumed approximately 120GB for my four A10 GPUs; however, with the PR, the peak consumption exceeded 190GB.

I will give the fix you suggested a try.

puyuanOT · 2023-05-02T15:19:46Z

I figured this out.

In the previous implementation:

    fname = str(hash(fname))  # hash the file name to avoid too long file name
    train_fname = f"{output_path}/traindata_{fname}.pt"
    eval_fname = f"{output_path}/evaldata_{fname}.pt"

    cache_found = os.path.isfile(train_fname) and os.path.isfile(eval_fname)
    buf_create_cache = torch.ByteTensor([not cache_found]).cuda()
    torch.distributed.all_reduce(buf_create_cache)

    if buf_create_cache.item() == 0:
        return torch.load(train_fname), torch.load(eval_fname)

The code re-preprocesses the data every time the script is called due to the instability of str(hash(fname)), which returns a different hash each time it's called. Consequently, the code never executes the line torch.load(train_fname), torch.load(eval_fname). Interestingly, the data preprocessing consumes significantly less memory than loading the precomputed torch tensors, so preparing the data each time actually uses less memory than loading the pre-computed tensors.

Empirically, when setting --data_path Dahoas/rm-static Dahoas/full-hh-rlhf Dahoas/synthetic-instruct-gptj-pairwise yitingxie/rlhf-reward-datasets openai/webgpt_comparisons stanfordnlp/SHP and 'max_length=1024', the script consumes less than 200GB memory when using 4 GPUs (processes) with the old implementation, but it consumes more than 350 GB with the new implementation.

puyuanOT · 2023-05-02T15:25:48Z

To clarify, the memory consumption mentioned above are peak consumption.

puyuanOT · 2023-05-02T15:27:12Z

@conglongli Not sure if there is way to both avoid re-processing and high memory consumption.

conglongli · 2023-05-02T17:08:47Z

@puyuanOT Thanks for the investigation. We are currently working on switching to memmap-based data management which hopefully could greatly reduce CPU memory consumption #450. However, it'd be a complex transition thus will take some time. So you might want to first rollback to the old implementation.

* only prepare data on rank 0 * fix hash

only prepare data on rank 0

b7da86c

conglongli requested review from jeffra, samyam, tjruwase, ShadenSmith, awan-10, cli99, eltonzheng, minjiaz, RezaYazdaniAminabadi, duli2012, mrwyattii, yaozhewei, arashb and xiaoxiawu-microsoft as code owners April 24, 2023 05:36

conglongli mentioned this pull request Apr 24, 2023

create_dataset_split function： When the data volume is large, it may cause memory overflow. In this case, we should use the map function in datasets. #383

Closed

fix hash

d776a93

conglongli merged commit 76c16f1 into master Apr 24, 2023

conglongli deleted the data_rank_0 branch April 24, 2023 08:19

conglongli mentioned this pull request Apr 28, 2023

Feature Request: add LazyPromptDataset to DeepSpeedChat #450

Open

conglongli added the deespeed chat DeepSpeed Chat label Apr 29, 2023

conglongli assigned conglongli and unassigned conglongli Apr 29, 2023

puyuanOT mentioned this pull request May 5, 2023

DeepspeedExample-Chat training failed with single_node parameter #475

Closed

leocnj pushed a commit to leocnj/DeepSpeedExamples that referenced this pull request May 27, 2023

only prepare data on rank 0 (microsoft#409)

dbe7f17

* only prepare data on rank 0 * fix hash

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

only prepare data on rank 0 #409

only prepare data on rank 0 #409

conglongli commented Apr 24, 2023 •

edited

Loading

puyuanOT commented Apr 25, 2023

conglongli commented Apr 25, 2023 •

edited

Loading

conglongli commented Apr 25, 2023

puyuanOT commented Apr 25, 2023

puyuanOT commented May 2, 2023 •

edited

Loading

puyuanOT commented May 2, 2023

puyuanOT commented May 2, 2023

conglongli commented May 2, 2023

only prepare data on rank 0 #409

only prepare data on rank 0 #409

Conversation

conglongli commented Apr 24, 2023 • edited Loading

puyuanOT commented Apr 25, 2023

conglongli commented Apr 25, 2023 • edited Loading

conglongli commented Apr 25, 2023

puyuanOT commented Apr 25, 2023

puyuanOT commented May 2, 2023 • edited Loading

puyuanOT commented May 2, 2023

puyuanOT commented May 2, 2023

conglongli commented May 2, 2023

conglongli commented Apr 24, 2023 •

edited

Loading

conglongli commented Apr 25, 2023 •

edited

Loading

puyuanOT commented May 2, 2023 •

edited

Loading