-
Notifications
You must be signed in to change notification settings - Fork 69
[Layer] Enable pipeline parallel feature. #221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Layer] Enable pipeline parallel feature. #221
Conversation
Amazing work! |
src/models/CMakeLists.txt
Outdated
@@ -14,7 +14,12 @@ | |||
# ============================================================================ | |||
cmake_minimum_required(VERSION 3.15.1) | |||
|
|||
find_package(MPI REQUIRED) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If oneCCL is not present in the user's environment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不是,我环境中已经有oneCCL,但是model src中报没有MPI库
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
现在解耦了吧,src/models里面的代码不依赖于oneCCL 和MPI
|
||
int layers_per_pp_stage = layers / ctx->ppSize; | ||
int start_layer = ctx->ppRank * layers_per_pp_stage; | ||
for (int i = start_layer; i < start_layer + layers_per_pp_stage; ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When layers is not divisible by ppSize, does it mean that a few layers (layers % ppSize) will not be processed? There is a warning but no termination if layers is not divisible by ppSize.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
上面的code已经报了error了,就不会执行这些code 了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
用户自己设定的,通过都是可以整除了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::cerr只是输出,并没有终止流程。不整除好像也是可以支持?后续代码中似乎并没有用到layers_per_pp_stage这种限定ppRank计算多少层的值?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
MPI_Recv(embBuf, batchSize * inputSeqLen * ctx->hiddenSize, MPI_FLOAT, prev_world_rank, curr_world_rank, | ||
MPI_COMM_WORLD, MPI_STATUS_IGNORE); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will reintroduce the MPI dependency into xft.so. It should be included in comm_helper.so and referenced through messager.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error: different scope when dynamic loading so file
|
||
public: | ||
static void initPipeline() { | ||
char *xft_pipeline_value = getenv("XFT_PIPELINE_STAGES"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check if MPI_rank is divisible by ppStages?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check in common_decoder.h
int embedding_world_rank = 0 * ctx->tpSize + ctx->tpRank; | ||
int predictor_world_rank = (ctx->ppSize - 1) * ctx->tpSize + ctx->tpRank; | ||
MPI_Send(this->nextTokens.data(), batchSize, MPI_INT32_T, embedding_world_rank, predictor_world_rank, | ||
MPI_COMM_WORLD); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use messager and comm_helper.so to decouple MPI dependency.
src/utils/messenger.h
Outdated
@@ -176,16 +201,21 @@ class Messenger { | |||
private: | |||
int size; | |||
int rank; | |||
int color; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if color is a common concept. Is it easy to understand for others? add some comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, color is common concept from https://www.mpich.org/static/docs/v3.1.3/www3/MPI_Comm_split.html.
src/models/CMakeLists.txt
Outdated
@@ -14,7 +14,12 @@ | |||
# ============================================================================ | |||
cmake_minimum_required(VERSION 3.15.1) | |||
|
|||
find_package(MPI REQUIRED) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
现在解耦了吧,src/models里面的代码不依赖于oneCCL 和MPI
Most code is clear, but MPI decouple is needed. |
Have used compile macros |
Usages:
cmake .. -DWITH_PIPELINE_PARALLEL=ON
to add MPI supportXFT_PIPELINE_STAGE
Marco to define pipeline parallel stages num.