-
Notifications
You must be signed in to change notification settings - Fork 163
add dry sampler #513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add dry sampler #513
Conversation
|
This already looks so much better than #504 just from looking at how much more similar it is to the reference implementation. It was taking time testing that because it looked like it had a lot of edge cases that would lead to issues or at least some incorrect behavior. |
| add_executable(${TARGET} rpc-server.cpp) | ||
| target_link_libraries(${TARGET} PRIVATE ggml) | ||
| target_compile_features(${TARGET} PRIVATE cxx_std_17) | ||
| target_compile_features(${TARGET} PRIVATE cxx_std_17) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's in the mainline file.
| SERVER_VERBOSE=$<BOOL:${LLAMA_SERVER_VERBOSE}> | ||
| ) | ||
|
|
||
| if (MSVC) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the stack size code, add_tensor function in ggml-rpc.cpp is using recursion to serialize the graph. Windows has very small stack size by default, so it is easy to cause stack overflow if graph is too complex. This is not needed for dry sampler, but a bug fix for rpc.
src/llama.cpp
Outdated
| } | ||
|
|
||
| struct llama_sampler_dry * llama_sampler_init_dry(const struct llama_model* model, float dry_multiplier, float dry_base, int32_t dry_allowed_length, int32_t dry_penalty_last_n, const char** seq_breakers, size_t num_breakers) { | ||
| return llama_sampler_init_dry_impl(model->vocab, llama_n_ctx_train(model), dry_multiplier, dry_base, dry_allowed_length, dry_penalty_last_n, seq_breakers, num_breakers); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The DRY sampler only depends on the vocabulary, not the entire model. Wouldn't it have been better to define the interface that way (taking a pointer to vocabulary instead of model)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can change it.
|
@saood06 Any other comments? |
|
Tried to build this to test and got this: /ik_llama.cpp/src/../include/llama.h:1240:54: error: unknown type name ‘llama_sampler_dry’
1240 | void llama_sample_dry(struct llama_context* ctx, llama_sampler_dry* smpl, llama_token_data_array* candidates_p);
| ^~~~~~~~~~~~~~~~~ |
Can you clean the build folder and try again? It compiles fine for me. |
This was with a clean build folder.
Maybe it is because you set |
Should be good this time. |
This reverts commit 3f111ad.
I test this using the example in vllm-project/vllm#11368 and it looks ok.