-
Notifications
You must be signed in to change notification settings - Fork 825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cpu all reduce #5849
Cpu all reduce #5849
Conversation
Conflicts: oneflow/core/ccl/ccl.cpp oneflow/core/ccl/ccl.h
oneflow/core/ccl/ccl.cpp
Outdated
transport_token, | ||
[&](void** buffer, std::size_t* size, std::function<void()>* Cb) -> Maybe<void> { | ||
*buffer = const_cast<T*>(send_ptr); | ||
*size = send_size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里单位好像不对
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
嗯。好像要乘上GetSizeOfDataType
BalancedSplitter bs(size, thread_num); | ||
MultiThreadLoop(thread_num, [&](size_t thread_idx) { | ||
size_t end = bs.At(thread_idx).end(); | ||
for (size_t i = bs.At(thread_idx).begin(); i < end; ++i) { out[i] = in0[i] + in1[i]; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MultiThreadLoop(size, [&](size_t i) {
out[i] = in0[i] + in1[i];
});
这里可以直接这样写吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
应该可以,当时我只是想增加更多的局部性,毕竟如果写成你这样,会在MultiThreadLoop内部的for循环里不断执行一个std::function,它并不高效。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
应该可以,当时我只是想增加更多的局部性,毕竟如果写成你这样,会在MultiThreadLoop内部的for循环里不断执行一个std::function,它并不高效。
嗯嗯,明白了
send_ptr = &in[bs.At(send_part_id).begin()]; | ||
} else { | ||
send_ptr = &out[bs.At(send_part_id).begin()]; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const T* send_ptr = &(i == 0 ? in : out)[bs.At(send_part_id).begin()]
这里这样写,i != 0, 无论bs.At(send_part_id).begin()是任何数字,总是会拿到out首地址,不懂为啥。
JUST(TransportUtil::ReceiveFromPrevRankInRing(rank_group, transport_token, &ctx)); | ||
} | ||
JUST(TransportUtil::WaitUntilDoneOrTimeout(ctx, TransportUtil::TimeoutSeconds())); | ||
const T* cur_in = &in[bs.At(recv_part_id).begin()]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里不会从out拿数据。
return JUST(one::functional::ConsistentAllReduce(tensor)); | ||
} | ||
|
||
COMMAND(RegisterBoxingFunction("cpu-p-to-b", CheckCpuP2B, &CpuP2B)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"cpu-p-to-b"可以改为"ccl-p-to-b",与"nccl-p-to-b"对应
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"cpu-p-to-b"可以改为"ccl-p-to-b",与"nccl-p-to-b"对应
CheckCpuP2B类似这样的字样,也改一下,咋样?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"cpu-p-to-b"可以改为"ccl-p-to-b",与"nccl-p-to-b"对应
已改
…, placement Conflicts: oneflow/core/boxing/ccl_boxing_function.cpp oneflow/user/kernels/eager_nccl_kernels.cpp
@@ -95,7 +95,8 @@ Maybe<BoxingExprIf> RawMainBoxingExpr() { | |||
| JUST(BoxingExpr(JUST(InPlacementAndBroadcast()), JUST(BoxingExpr("nccl-s-to-b")), | |||
JUST(BoxingExpr("naive-b-to-p")))) | |||
| JUST(BoxingExpr("asymmetric-x-to-b")) | JUST(OneToNBoxingExpr()) | JUST(NToOneBoxingExpr()) | |||
| JUST(BoxingExpr("naive-1-to-1")) | JUST(GenericBoxingExpr()); | |||
| JUST(BoxingExpr("naive-1-to-1")) | JUST(GenericBoxingExpr()) | |||
| JUST(BoxingExpr("ccl-p-to-b")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
放在nccl BoxingExpr后面
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
放在nccl BoxingExpr后面
已修改
@@ -13,6 +13,7 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |||
See the License for the specific language governing permissions and | |||
limitations under the License. | |||
*/ | |||
#include <atomic> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
把这个文件revert吧
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
把这个文件revert吧
已revert
Speed stats:
|
CI failed, removing label automerge |
port unavailable error, 所以手动重新跑上了。 |
Speed stats:
|
cpu 版 all_reduce,基于transport。