[Layer] Enable pipeline parallel feature. #221

changqi1 · 2024-02-07T03:32:50Z

Usages:

build with cmake .. -DWITH_PIPELINE_PARALLEL=ON to add MPI support
Add XFT_PIPELINE_STAGE Marco to define pipeline parallel stages num.

Pipeline parallel and tensor parallel introduction:
  1) MPI_Instances = 16,XFT_PIPELINE_STAGE = 4  =>  ctx->ppSize = 4, ctx->tpSize = 4
  2) TP sync by oneCCL(row_comm) or shared_memory
  3) PP sync by MPI MPI_COMM_WORLD
  World Rank:      => Row Rank:       => Rank:  tp0 tp1 tp2 tp3
  [ 0,  1,  2,  3,    [ 0, 1, 2, 3];      pp0 [  0,  1,  2,  3];
    4,  5,  6,  7,    [ 0, 1, 2, 3];      pp1 [  0,  1,  2,  3];
    8,  9, 10, 11,    [ 0, 1, 2, 3];      pp2 [  0,  1,  2,  3];
   12, 13, 14, 15];   [ 0, 1, 2, 3];      pp3 [  0,  1,  2,  3];

                                      Prompts
                                         │
            ┌──────────────────┬─────────┴────────┬──────────────────┐
            │                  │                  │                  │
            ▼                  ▼                  ▼                  ▼
       Embedding(PP0)     Embedding(PP0)     Embedding(PP0)     Embedding(PP0)
            │                  │                  │                  │
  PP0       │                  │                  │                  │
  ┌─────────┼──────────────────┼──────────────────┼──────────────────┼──────────────┐
  │ TP0     │          TP1     │          TP2     │          TP3     │    layer0-7  │
  │ ┌───────▼────────┐ ┌───────▼────────┐ ┌───────▼────────┐ ┌───────▼────────┐     │
  │ │ OMP            │ │ OMP            │ │ OMP            │ │ OMP            │     │
  │ │ │ │ │ │ │ │    │ │ │ │ │ │ │ │    │ │ │ │ │ │ │ │    │ │ │ │ │ │ │ │    │     │
  │ │ ▼ ▼ ▼ ▼ ▼ ▼ ...│ │ ▼ ▼ ▼ ▼ ▼ ▼ ...│ │ ▼ ▼ ▼ ▼ ▼ ▼ ...│ │ ▼ ▼ ▼ ▼ ▼ ▼ ...│     │
  │ └───────┬────────┘ └───────┬────────┘ └───────┬────────┘ └───────┬────────┘     │
  │ ┌───────┼──────────────────┼─────AllReduce────┼──────────────────┼────────┐     │
  │ └───────┼──────────────────┼──────────────────┼──────────────────┼────────┘     │
  └─────────┼──────────────────┼──────────────────┼──────────────────┼──────────────┘
  PP1       │ MPI Send/Recv    │                  │                  │
  ┌─────────┼──────────────────┼──────────────────┼──────────────────┼──────────────┐
  │ TP0     │          TP1     │           TP2    │            TP3   │   layer8-15  │
  │ ┌───────▼────────┐ ┌───────▼────────┐ ┌───────▼────────┐ ┌───────▼────────┐     │
  │ │ OMP            │ │ OMP            │ │ OMP            │ │ OMP            │     │
  │ │ │ │ │ │ │ │    │ │ │ │ │ │ │ │    │ │ │ │ │ │ │ │    │ │ │ │ │ │ │ │    │     │
  │ │ ▼ ▼ ▼ ▼ ▼ ▼ ...│ │ ▼ ▼ ▼ ▼ ▼ ▼ ...│ │ ▼ ▼ ▼ ▼ ▼ ▼ ...│ │ ▼ ▼ ▼ ▼ ▼ ▼ ...│     │
  │ └───────┬────────┘ └───────┬────────┘ └───────┬────────┘ └───────┬────────┘     │
  │ ┌───────┼──────────────────┼─────AllReduce────┼──────────────────┼────────┐     │
  │ └───────┼──────────────────┼──────────────────┼──────────────────┼────────┘     │
  └─────────┼──────────────────┼──────────────────┼──────────────────┼──────────────┘
  PP2       │ MPI Send/Recv    │                  │                  │
  ┌─────────┼──────────────────┼──────────────────┼──────────────────┼──────────────┐
  │ TP0     │          TP1     │           TP2    │            TP3   │  layer16-23  │
  │ ┌───────▼────────┐ ┌───────▼────────┐ ┌───────▼────────┐ ┌───────▼────────┐     │
  │ │ OMP            │ │ OMP            │ │ OMP            │ │ OMP            │     │
  │ │ │ │ │ │ │ │    │ │ │ │ │ │ │ │    │ │ │ │ │ │ │ │    │ │ │ │ │ │ │ │    │     │
  │ │ ▼ ▼ ▼ ▼ ▼ ▼ ...│ │ ▼ ▼ ▼ ▼ ▼ ▼ ...│ │ ▼ ▼ ▼ ▼ ▼ ▼ ...│ │ ▼ ▼ ▼ ▼ ▼ ▼ ...│     │
  │ └───────┬────────┘ └───────┬────────┘ └───────┬────────┘ └───────┬────────┘     │
  │ ┌───────┼──────────────────┼─────AllReduce────┼──────────────────┼────────┐     │
  │ └───────┼──────────────────┼──────────────────┼──────────────────┼────────┘     │
  └─────────┼──────────────────┼──────────────────┼──────────────────┼──────────────┘
  PP3       │ MPI Send/Recv    │                  │                  │
  ┌─────────┼──────────────────┼──────────────────┼──────────────────┼──────────────┐
  │ TP0     │          TP1     │           TP2    │            TP3   │  layer24-31  │
  │ ┌───────▼────────┐ ┌───────▼────────┐ ┌───────▼────────┐ ┌───────▼────────┐     │
  │ │ OMP            │ │ OMP            │ │ OMP            │ │ OMP            │     │
  │ │ │ │ │ │ │ │    │ │ │ │ │ │ │ │    │ │ │ │ │ │ │ │    │ │ │ │ │ │ │ │    │     │
  │ │ ▼ ▼ ▼ ▼ ▼ ▼ ...│ │ ▼ ▼ ▼ ▼ ▼ ▼ ...│ │ ▼ ▼ ▼ ▼ ▼ ▼ ...│ │ ▼ ▼ ▼ ▼ ▼ ▼ ...│     │
  │ └───────┬────────┘ └───────┬────────┘ └───────┬────────┘ └───────┬────────┘     │
  │ ┌───────┼──────────────────┼─────AllReduce────┼──────────────────┼────────┐     │
  │ └───────┼──────────────────┼──────────────────┼──────────────────┼────────┘     │
  └─────────┼──────────────────┼──────────────────┼──────────────────┼──────────────┘
            │                  │                  │                  │
            ▼                  ▼                  ▼                  ▼
       Predictor(PP3)     Predictor(PP3)     Predictor(PP3)     Predictor(PP3)
            │ MPI Send/Recv    │                  │                  │
            ▼                  ▼                  ▼                  ▼
       Searchers(PP0)     Searchers(PP0)     Searchers(PP0)     Searchers(PP0)
            │
            ▼
         Output

// pp=1, tp=2
$ XFT_PIPELINE_STAGE=1 OMP_NUM_THREADS=12 mpirun  \
    -n 1 numactl --all -C 48-59 -m 1 ./example --model /data/qwen-1.8b-chat-xft/ --token /data/qwen-1.8b-chat-hf/tokenizer_config.json --dtype fp16 --loop 1 --input_len 16 --output_len 16 :  \
    -n 1 numactl --all -C 60-71 -m 1 ./example --model /data/qwen-1.8b-chat-xft/ --token /data/qwen-1.8b-chat-hf/tokenizer_config.json --dtype fp16 --loop 1 --input_len 16 --output_len 16

// pp=2, tp=1
$ XFT_PIPELINE_STAGE=2 OMP_NUM_THREADS=12 mpirun  \
    -n 1 numactl --all -C 48-59 -m 1 ./example --model /data/qwen-1.8b-chat-xft/ --token /data/qwen-1.8b-chat-hf/tokenizer_config.json --dtype fp16 --loop 1 --input_len 16 --output_len 16 :  \
    -n 1 numactl --all -C 60-71 -m 1 ./example --model /data/qwen-1.8b-chat-xft/ --token /data/qwen-1.8b-chat-hf/tokenizer_config.json --dtype fp16 --loop 1 --input_len 16 --output_len 16

// pp=1, tp=4
$ XFT_PIPELINE_STAGE=1 OMP_NUM_THREADS=12 mpirun  \
    -n 1 numactl --all -C 48-59 -m 1 ./example --model /data/qwen-1.8b-chat-xft/ --token /data/qwen-1.8b-chat-hf/tokenizer_config.json --dtype fp16 --loop 1 --input_len 16 --output_len 16 :  \
    -n 1 numactl --all -C 60-71 -m 1 ./example --model /data/qwen-1.8b-chat-xft/ --token /data/qwen-1.8b-chat-hf/tokenizer_config.json --dtype fp16 --loop 1 --input_len 16 --output_len 16 :  \
    -n 1 numactl --all -C 72-83 -m 1 ./example --model /data/qwen-1.8b-chat-xft/ --token /data/qwen-1.8b-chat-hf/tokenizer_config.json --dtype fp16 --loop 1 --input_len 16 --output_len 16 :  \
    -n 1 numactl --all -C 84-95 -m 1 ./example --model /data/qwen-1.8b-chat-xft/ --token /data/qwen-1.8b-chat-hf/tokenizer_config.json --dtype fp16 --loop 1 --input_len 16 --output_len 16

// pp=2, tp=2
$ XFT_PIPELINE_STAGE=2 OMP_NUM_THREADS=12 mpirun  \
    -n 1 numactl --all -C 48-59 -m 1 ./example --model /data/qwen-1.8b-chat-xft/ --token /data/qwen-1.8b-chat-hf/tokenizer_config.json --dtype fp16 --loop 1 --input_len 16 --output_len 16 :  \
    -n 1 numactl --all -C 60-71 -m 1 ./example --model /data/qwen-1.8b-chat-xft/ --token /data/qwen-1.8b-chat-hf/tokenizer_config.json --dtype fp16 --loop 1 --input_len 16 --output_len 16 :  \
    -n 1 numactl --all -C 72-83 -m 1 ./example --model /data/qwen-1.8b-chat-xft/ --token /data/qwen-1.8b-chat-hf/tokenizer_config.json --dtype fp16 --loop 1 --input_len 16 --output_len 16 :  \
    -n 1 numactl --all -C 84-95 -m 1 ./example --model /data/qwen-1.8b-chat-xft/ --token /data/qwen-1.8b-chat-hf/tokenizer_config.json --dtype fp16 --loop 1 --input_len 16 --output_len 16

// pp=4, tp=1
$ XFT_PIPELINE_STAGE=4 OMP_NUM_THREADS=12 mpirun  \
    -n 1 numactl --all -C 48-59 -m 1 ./example --model /data/qwen-1.8b-chat-xft/ --token /data/qwen-1.8b-chat-hf/tokenizer_config.json --dtype fp16 --loop 1 --input_len 16 --output_len 16 :  \
    -n 1 numactl --all -C 60-71 -m 1 ./example --model /data/qwen-1.8b-chat-xft/ --token /data/qwen-1.8b-chat-hf/tokenizer_config.json --dtype fp16 --loop 1 --input_len 16 --output_len 16 :  \
    -n 1 numactl --all -C 72-83 -m 1 ./example --model /data/qwen-1.8b-chat-xft/ --token /data/qwen-1.8b-chat-hf/tokenizer_config.json --dtype fp16 --loop 1 --input_len 16 --output_len 16 :  \
    -n 1 numactl --all -C 84-95 -m 1 ./example --model /data/qwen-1.8b-chat-xft/ --token /data/qwen-1.8b-chat-hf/tokenizer_config.json --dtype fp16 --loop 1 --input_len 16 --output_len 16

// pp=1, tp=8
$ XFT_PIPELINE_STAGE=1 OMP_NUM_THREADS=12 mpirun  \
    -n 1 numactl --all -C 48-59 -m 1 ./example --model /data/qwen-1.8b-chat-xft/ --token /data/qwen-1.8b-chat-hf/tokenizer_config.json --dtype fp16 --loop 1 --input_len 16 --output_len 16 :  \
    -n 1 numactl --all -C 60-71 -m 1 ./example --model /data/qwen-1.8b-chat-xft/ --token /data/qwen-1.8b-chat-hf/tokenizer_config.json --dtype fp16 --loop 1 --input_len 16 --output_len 16 :  \
    -n 1 numactl --all -C 72-83 -m 1 ./example --model /data/qwen-1.8b-chat-xft/ --token /data/qwen-1.8b-chat-hf/tokenizer_config.json --dtype fp16 --loop 1 --input_len 16 --output_len 16 :  \
    -n 1 numactl --all -C 84-95 -m 1 ./example --model /data/qwen-1.8b-chat-xft/ --token /data/qwen-1.8b-chat-hf/tokenizer_config.json --dtype fp16 --loop 1 --input_len 16 --output_len 16 :  \
    -n 1 numactl --all -C  0-11 -m 1 ./example --model /data/qwen-1.8b-chat-xft/ --token /data/qwen-1.8b-chat-hf/tokenizer_config.json --dtype fp16 --loop 1 --input_len 16 --output_len 16 :  \
    -n 1 numactl --all -C 12-23 -m 1 ./example --model /data/qwen-1.8b-chat-xft/ --token /data/qwen-1.8b-chat-hf/tokenizer_config.json --dtype fp16 --loop 1 --input_len 16 --output_len 16 :  \
    -n 1 numactl --all -C 24-35 -m 1 ./example --model /data/qwen-1.8b-chat-xft/ --token /data/qwen-1.8b-chat-hf/tokenizer_config.json --dtype fp16 --loop 1 --input_len 16 --output_len 16 :  \
    -n 1 numactl --all -C 36-47 -m 1 ./example --model /data/qwen-1.8b-chat-xft/ --token /data/qwen-1.8b-chat-hf/tokenizer_config.json --dtype fp16 --loop 1 --input_len 16 --output_len 16

// pp=2, tp=4
$ XFT_PIPELINE_STAGE=2 OMP_NUM_THREADS=12 mpirun  \
    -n 1 numactl --all -C 48-59 -m 1 ./example --model /data/qwen-1.8b-chat-xft/ --token /data/qwen-1.8b-chat-hf/tokenizer_config.json --dtype fp16 --loop 1 --input_len 16 --output_len 16 :  \
    -n 1 numactl --all -C 60-71 -m 1 ./example --model /data/qwen-1.8b-chat-xft/ --token /data/qwen-1.8b-chat-hf/tokenizer_config.json --dtype fp16 --loop 1 --input_len 16 --output_len 16 :  \
    -n 1 numactl --all -C 72-83 -m 1 ./example --model /data/qwen-1.8b-chat-xft/ --token /data/qwen-1.8b-chat-hf/tokenizer_config.json --dtype fp16 --loop 1 --input_len 16 --output_len 16 :  \
    -n 1 numactl --all -C 84-95 -m 1 ./example --model /data/qwen-1.8b-chat-xft/ --token /data/qwen-1.8b-chat-hf/tokenizer_config.json --dtype fp16 --loop 1 --input_len 16 --output_len 16 :  \
    -n 1 numactl --all -C  0-11 -m 1 ./example --model /data/qwen-1.8b-chat-xft/ --token /data/qwen-1.8b-chat-hf/tokenizer_config.json --dtype fp16 --loop 1 --input_len 16 --output_len 16 :  \
    -n 1 numactl --all -C 12-23 -m 1 ./example --model /data/qwen-1.8b-chat-xft/ --token /data/qwen-1.8b-chat-hf/tokenizer_config.json --dtype fp16 --loop 1 --input_len 16 --output_len 16 :  \
    -n 1 numactl --all -C 24-35 -m 1 ./example --model /data/qwen-1.8b-chat-xft/ --token /data/qwen-1.8b-chat-hf/tokenizer_config.json --dtype fp16 --loop 1 --input_len 16 --output_len 16 :  \
    -n 1 numactl --all -C 36-47 -m 1 ./example --model /data/qwen-1.8b-chat-xft/ --token /data/qwen-1.8b-chat-hf/tokenizer_config.json --dtype fp16 --loop 1 --input_len 16 --output_len 16

// pp=4, tp=2
$ XFT_PIPELINE_STAGE=4 OMP_NUM_THREADS=12 mpirun  \
    -n 1 numactl --all -C 48-59 -m 1 ./example --model /data/qwen-1.8b-chat-xft/ --token /data/qwen-1.8b-chat-hf/tokenizer_config.json --dtype fp16 --loop 1 --input_len 16 --output_len 16 :  \
    -n 1 numactl --all -C 60-71 -m 1 ./example --model /data/qwen-1.8b-chat-xft/ --token /data/qwen-1.8b-chat-hf/tokenizer_config.json --dtype fp16 --loop 1 --input_len 16 --output_len 16 :  \
    -n 1 numactl --all -C 72-83 -m 1 ./example --model /data/qwen-1.8b-chat-xft/ --token /data/qwen-1.8b-chat-hf/tokenizer_config.json --dtype fp16 --loop 1 --input_len 16 --output_len 16 :  \
    -n 1 numactl --all -C 84-95 -m 1 ./example --model /data/qwen-1.8b-chat-xft/ --token /data/qwen-1.8b-chat-hf/tokenizer_config.json --dtype fp16 --loop 1 --input_len 16 --output_len 16 :  \
    -n 1 numactl --all -C  0-11 -m 1 ./example --model /data/qwen-1.8b-chat-xft/ --token /data/qwen-1.8b-chat-hf/tokenizer_config.json --dtype fp16 --loop 1 --input_len 16 --output_len 16 :  \
    -n 1 numactl --all -C 12-23 -m 1 ./example --model /data/qwen-1.8b-chat-xft/ --token /data/qwen-1.8b-chat-hf/tokenizer_config.json --dtype fp16 --loop 1 --input_len 16 --output_len 16 :  \
    -n 1 numactl --all -C 24-35 -m 1 ./example --model /data/qwen-1.8b-chat-xft/ --token /data/qwen-1.8b-chat-hf/tokenizer_config.json --dtype fp16 --loop 1 --input_len 16 --output_len 16 :  \
    -n 1 numactl --all -C 36-47 -m 1 ./example --model /data/qwen-1.8b-chat-xft/ --token /data/qwen-1.8b-chat-hf/tokenizer_config.json --dtype fp16 --loop 1 --input_len 16 --output_len 16

// pp=8, tp=1
$ XFT_PIPELINE_STAGE=8 OMP_NUM_THREADS=12 mpirun  \
    -n 1 numactl --all -C 48-59 -m 1 ./example --model /data/qwen-1.8b-chat-xft/ --token /data/qwen-1.8b-chat-hf/tokenizer_config.json --dtype fp16 --loop 1 --input_len 16 --output_len 16 :  \
    -n 1 numactl --all -C 60-71 -m 1 ./example --model /data/qwen-1.8b-chat-xft/ --token /data/qwen-1.8b-chat-hf/tokenizer_config.json --dtype fp16 --loop 1 --input_len 16 --output_len 16 :  \
    -n 1 numactl --all -C 72-83 -m 1 ./example --model /data/qwen-1.8b-chat-xft/ --token /data/qwen-1.8b-chat-hf/tokenizer_config.json --dtype fp16 --loop 1 --input_len 16 --output_len 16 :  \
    -n 1 numactl --all -C 84-95 -m 1 ./example --model /data/qwen-1.8b-chat-xft/ --token /data/qwen-1.8b-chat-hf/tokenizer_config.json --dtype fp16 --loop 1 --input_len 16 --output_len 16 :  \
    -n 1 numactl --all -C  0-11 -m 1 ./example --model /data/qwen-1.8b-chat-xft/ --token /data/qwen-1.8b-chat-hf/tokenizer_config.json --dtype fp16 --loop 1 --input_len 16 --output_len 16 :  \
    -n 1 numactl --all -C 12-23 -m 1 ./example --model /data/qwen-1.8b-chat-xft/ --token /data/qwen-1.8b-chat-hf/tokenizer_config.json --dtype fp16 --loop 1 --input_len 16 --output_len 16 :  \
    -n 1 numactl --all -C 24-35 -m 1 ./example --model /data/qwen-1.8b-chat-xft/ --token /data/qwen-1.8b-chat-hf/tokenizer_config.json --dtype fp16 --loop 1 --input_len 16 --output_len 16 :  \
    -n 1 numactl --all -C 36-47 -m 1 ./example --model /data/qwen-1.8b-chat-xft/ --token /data/qwen-1.8b-chat-hf/tokenizer_config.json --dtype fp16 --loop 1 --input_len 16 --output_len 16

intelyoungway · 2024-02-08T01:54:55Z

Amazing work!

Duyi-Wang · 2024-02-08T02:34:07Z

src/models/CMakeLists.txt

@@ -14,7 +14,12 @@
 # ============================================================================
 cmake_minimum_required(VERSION 3.15.1)

+find_package(MPI REQUIRED)


If oneCCL is not present in the user's environment？

不是，我环境中已经有oneCCL，但是model src中报没有MPI库

现在解耦了吧，src/models里面的代码不依赖于oneCCL 和MPI

Duyi-Wang · 2024-02-08T02:40:06Z

src/models/common_decoder.h

+
+        int layers_per_pp_stage = layers / ctx->ppSize;
+        int start_layer = ctx->ppRank * layers_per_pp_stage;
+        for (int i = start_layer; i < start_layer + layers_per_pp_stage; ++i) {


When layers is not divisible by ppSize, does it mean that a few layers (layers % ppSize) will not be processed? There is a warning but no termination if layers is not divisible by ppSize.

上面的code已经报了error了，就不会执行这些code 了

用户自己设定的，通过都是可以整除了

std::cerr只是输出，并没有终止流程。不整除好像也是可以支持？后续代码中似乎并没有用到layers_per_pp_stage这种限定ppRank计算多少层的值？

Duyi-Wang · 2024-02-08T02:44:48Z

src/models/common_decoder.h

+            MPI_Recv(embBuf, batchSize * inputSeqLen * ctx->hiddenSize, MPI_FLOAT, prev_world_rank, curr_world_rank,
+                    MPI_COMM_WORLD, MPI_STATUS_IGNORE);
+        }
+


This will reintroduce the MPI dependency into xft.so. It should be included in comm_helper.so and referenced through messager.

Error: different scope when dynamic loading so file

Duyi-Wang · 2024-02-08T02:48:26Z

src/utils/verbose.h

+
+public:
+    static void initPipeline() {
+        char *xft_pipeline_value = getenv("XFT_PIPELINE_STAGES");


Check if MPI_rank is divisible by ppStages?

Check in common_decoder.h

Duyi-Wang · 2024-02-08T03:00:53Z

src/searchers/greedy_search.cpp

+            int embedding_world_rank = 0 * ctx->tpSize + ctx->tpRank;
+            int predictor_world_rank = (ctx->ppSize - 1) * ctx->tpSize + ctx->tpRank;
+            MPI_Send(this->nextTokens.data(), batchSize, MPI_INT32_T, embedding_world_rank, predictor_world_rank,
+                    MPI_COMM_WORLD);


Use messager and comm_helper.so to decouple MPI dependency.

pujiang2018 · 2024-02-18T08:09:04Z

src/utils/messenger.h

@@ -176,16 +201,21 @@ class Messenger {
 private:
    int size;
    int rank;
+    int color;


I don't know if color is a common concept. Is it easy to understand for others? add some comment?

Yes, color is common concept from https://www.mpich.org/static/docs/v3.1.3/www3/MPI_Comm_split.html.

pujiang2018 · 2024-02-18T08:16:39Z

src/models/CMakeLists.txt

@@ -14,7 +14,12 @@
 # ============================================================================
 cmake_minimum_required(VERSION 3.15.1)

+find_package(MPI REQUIRED)


现在解耦了吧，src/models里面的代码不依赖于oneCCL 和MPI

pujiang2018 · 2024-02-18T08:30:27Z

Most code is clear, but MPI decouple is needed.

changqi1 · 2024-02-18T08:48:45Z

Most code is clear, but MPI decouple is needed.

Have used compile macros PIPELINE_PARALLEL to decouple MPI.

init pp on pp=2,tp=1.

9f24e1f

changqi1 marked this pull request as draft February 7, 2024 03:33

changqi1 added 6 commits February 7, 2024 15:58

optimize dispatch.

c9993a0

fix pp=2,tp=2

d4e6be3

format code

d3515cd

format code

3a92d20

format code

10b83fb

format code

24a288c

changqi1 marked this pull request as ready for review February 7, 2024 14:13

changqi1 requested a review from pujiang2018 February 7, 2024 14:15

changqi1 added 2 commits February 7, 2024 22:18

format code

cf3bce4

format code

df1f96c

Add introduction

3614aec

Duyi-Wang reviewed Feb 8, 2024

View reviewed changes

rename pipeline stage

18f44cd

Duyi-Wang reviewed Feb 8, 2024

View reviewed changes

changqi1 added 3 commits February 8, 2024 12:28

enable shm

8fa5022

Add message

155f85e

format code

9d002e7

intel deleted a comment from Duyi-Wang Feb 8, 2024

changqi1 marked this pull request as draft February 8, 2024 07:29

changqi1 added 3 commits February 18, 2024 12:22

format code

33d5427

clear code

79bded3

Add build option

841c3db

changqi1 marked this pull request as ready for review February 18, 2024 05:11

pujiang2018 reviewed Feb 18, 2024

View reviewed changes

decouple mpi

d8a4f31

Add more TODO

40bacdb

Duyi-Wang approved these changes Feb 19, 2024

View reviewed changes

changqi1 merged commit eea16a5 into intel:main Feb 19, 2024

[Layer] Enable pipeline parallel feature. #221

[Layer] Enable pipeline parallel feature. #221

Uh oh!

Conversation

changqi1 commented Feb 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

intelyoungway commented Feb 8, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Duyi-Wang Feb 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pujiang2018 commented Feb 18, 2024

Uh oh!

changqi1 commented Feb 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

changqi1 commented Feb 7, 2024 •

edited

Loading

Duyi-Wang Feb 8, 2024 •

edited

Loading

changqi1 commented Feb 18, 2024 •

edited

Loading