-
Notifications
You must be signed in to change notification settings - Fork 102
feat: support deepseek-v3.2-Exp for npu. #562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| std::memcpy(cum_vec.data(), | ||
| cum_tensor.data_ptr<int>(), | ||
| cum_tensor.numel() * sizeof(int)); | ||
| ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are you doing in line 661-673?
why not get std::vector<int32_t> q_cu_seq_lens from std::vector<int32_t> q_seq_lens directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleted.
|
|
||
| SET_ARG(stop_token_ids, std::unordered_set<int32_t>({1})); | ||
| }); | ||
| } // namespace xllm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why change this line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no changed
| #include <torch/torch.h> | ||
|
|
||
| #include <boost/algorithm/string.hpp> | ||
| #include <string> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove:
#include <gflags/gflags.h>
#include <boost/algorithm/string.hpp>There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleted.
| } | ||
|
|
||
| } // namespace layer | ||
| } // namespace xllm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why change this line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Restored to the state before modification.
| void initialize_quantization_parameters( | ||
| atb_speed::deepseekV2::DecoderLayerParam& param); | ||
|
|
||
| void initialize_kimi_k2_parameters( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why need initialize kimi_k2 parameters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleted.
| std::memcpy(cum_vec.data(), | ||
| cum_tensor.data_ptr<int>(), | ||
| cum_tensor.numel() * sizeof(int)); | ||
| ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
; is redundant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleted.
| raw_forward_input.q_max_seq_len = state_.q_max_seq_len; | ||
| raw_forward_input.seq_lens = std::move(state_.seq_lens); | ||
| raw_forward_input.q_seq_lens = std::move(state_.q_seq_lens); | ||
| torch::Tensor q_seq_len_tensor = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this torch op may be not fast than std::partial_sum;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has been changed to use std::partial_sum
|
|
||
| params.kv_seq_lens = safe_to(kv_seq_lens, device, true); | ||
| params.q_seq_lens = safe_to(q_seq_lens, device, true); | ||
| params.q_cu_seq_lens = safe_to(q_cu_seq_lens, device, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we need to use both cu* q_cu_seq_lens and non cu* q_seq_lens params.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DeepSeek V3.2 requires the use of the sparse flash attention operator, where both q_seq_lens and q_cu_seq_lens are inputs to this operator.
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| See the License for the specific language governing permissions and | ||
| limitations under the License. | ||
| ==============================================================================*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
must add #pragma once
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
| pb_forward_input->q_seq_lens().end()); | ||
| // aprint<int32_t>(q_seq_lens, "q_seq_lens", global_rank_); | ||
| std::vector<int32_t> q_cu_seq_lens(q_seq_lens.size()); | ||
| std::partial_sum(q_seq_lens.begin(), q_seq_lens.end(), q_cu_seq_lens.begin()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: please confirm the size of q_cu_seq_lens , in mlu and gpu, the first item is 0, so q_cu_seq_lens.size() == batch_size+1. Here in npu, q_cu_seq_lens.size == batchsize.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation for the sparse flashattention operator requires that the shapes of q_seq_lens and q_cu_seq_lens must be equal.
yq33victor
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merge the code and speed up the deepseek32 testing process.
No description provided.