-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose Llava as a shared library for downstream projects #3613
Conversation
This PR will be very helpful for implementing |
base64 support is working.
output: |
examples/llava/base64.hpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably this should go to common
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
examples/llava/clip.cpp
Outdated
|
||
bool clip_image_load_from_bytes(const unsigned char * bytes, int bytes_length, clip_image_u8 * img) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bool clip_image_load_from_bytes(const unsigned char * bytes, int bytes_length, clip_image_u8 * img) { | |
bool clip_image_load_from_bytes(const unsigned char * bytes, size_t size, clip_image_u8 * img) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
examples/llava/llava.cpp
Outdated
|
||
const char * clip_path = params.mmproj.c_str(); | ||
const char * img_path = params.image.c_str(); | ||
static void find_image_tag_in_prompt(const std::string& prompt, size_t& begin_out, size_t& end_out) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to have such functions in llava-utils.h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 done
examples/llava/llava.cpp
Outdated
auto base64_bytes_start = img_base64_str_start + strlen(IMG_BASE64_TAG_BEGIN); | ||
auto base64_bytes_count = img_base64_str_end - base64_bytes_start; | ||
auto base64_str = prompt.substr(base64_bytes_start, base64_bytes_count ); | ||
printf("base64_str: '%s'\n", base64_str.c_str()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not print this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was debugging/wip - done.
examples/llava/llava.cpp
Outdated
|
||
if (!clip_image_preprocess(ctx_clip, &img, &img_res, /*pad2square =*/ true)) { | ||
fprintf(stderr, "%s: unable to preprocess %s\n", __func__, img_path); | ||
struct llava_context * llava_init(gpt_params * params) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Image loading and inference parts should be stripped off this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
examples/llava/llava.cpp
Outdated
free(ctx_llava->image_embd); | ||
} | ||
|
||
void llava_process_prompt(struct llava_context * ctx_llava, gpt_params * params, const char * prompt) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's avoid such god-like functions for now --it kills hackability in early stages of development --it brings no benefit over functions in llava-utils.h
. Better to have single-responsibility functions as much as possible to enhance the development speed and flexibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 sure
Server now support LlaVA #3589 |
@monatis thanks for the review, but do please note this PR is still in "draft" state. i addressed all your comments, and actually many of them anticipated what i had planned to do anyway. so. i'm not sure if the following is correct/desired or in fitting with the way this project is organised, but i went ahead and did it anyway. essentially, i have refactored the existing llava demo to a llava library and a llava cli/MVP demo:
as mentioned in the top post my primary goal here is to enable lmql support, which means enabling llava code paths with llama-cpp-python bindings. am i headed in the right direction to do this, or is there something more appropriate i could be doing? i note there's a |
actually now that i look over llama-cpp-python this might all be overengineered. all that's really needed i guess is
|
d694dac
to
d64891b
Compare
this is in a much better state now. the public api is basically: /** load mmproj model */
LLAMA_API struct clip_ctx * clip_model_load(const char * fname, const int verbosity);
/** free mmproj model */
LLAMA_API void clip_free(struct clip_ctx * ctx);
/** sanity check for clip <-> llava embed size match */
LLAMA_API bool llava_validate_embed_size(const llama_context * ctx_llama, const clip_ctx * ctx_clip);
/** build an image embed from image file bytes */
LLAMA_API struct llava_image_embed * llava_image_embed_make_with_bytes(struct clip_ctx * ctx_clip, int n_threads, const unsigned char * image_bytes, int image_bytes_length);
/** build an image embed from a path to an image filename */
LLAMA_API struct llava_image_embed * llava_image_embed_make_with_filename(struct clip_ctx * ctx_clip, int n_threads, const char * image_path);
/** free an embedding made with llava_image_embed_make_* */
LLAMA_API void llava_image_embed_free(struct llava_image_embed * embed);
/** write the image represented by embed into the llama context with batch size n_batch,
* starting at context pos n_past. on completion, n_past points to the next position in the context after the image embed. */
LLAMA_API bool llava_eval_image_embed(struct llama_context * ctx_llama, const struct llava_image_embed * embed, int n_batch, int * n_past); |
Some minor suggestions about the API:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The llava
API is not a good idea.
The only thing a 3rd party project needs in order to implement LLaVA is:
- CLIP API
- LLaMA API
Everything else are helper functions that do not belong neither to llama.cpp
nor clip.cpp
. If we want to reuse LLaVA helpers across llama.cpp
's examples (e.g. llava
, server
, etc.) these should go into common/llava.h/.cpp
Some first-impression suggestions from my side:
I'll make a thorough review tomorrow. |
See my implementation of llava in server example |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Windows+CUDA still fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very close to what I had in mind, though some things on the API level can be improved. I've added a few comments regarding that.
examples/llava/CMakeLists.txt
Outdated
target_link_libraries(${TARGET} PRIVATE common ggml ${CMAKE_THREAD_LIBS_INIT}) | ||
set(TARGET llava) | ||
|
||
add_library(${TARGET} STATIC llava.cpp llava.h clip.cpp clip.h) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
STATIC
should be dropped here to allow building llava
as a shared library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, will update it to build a shared library as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, from my tests that looks like the only change I need to integrate this with llama-cpp-python
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good, now it also builds as a shared lib.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That works, thank you!
Ommiting the STATIC
/ OBJECT
from target_link_libraries
also seemed to work with just the llava
target without the need for the llava_shared
target but if having different targets is preffered I don't mind it.
It seems to me that a good way to check if this PR is achieving its goals is to implement the Llava API in the server example. If it proves to be a better option than how Llava currently functions on the server, it's easy to deduce that we're heading in the right direction since we can eliminate a lot of duplicate code from the server. This is just my opinion to help; I don't intend to come across as a know-it-all. |
glad this made it in, thanks folks! |
…#3613) * wip llava python bindings compatibility * add external llava API * add base64 in-prompt image support * wip refactor image loading * refactor image load out of llava init * cleanup * further cleanup; move llava-cli into its own file and rename * move base64.hpp into common/ * collapse clip and llava libraries * move llava into its own subdir * wip * fix bug where base64 string was not removed from the prompt * get libllava to output in the right place * expose llava methods in libllama.dylib * cleanup memory usage around clip_image_* * cleanup and refactor *again* * update headerdoc * build with cmake, not tested (WIP) * Editorconfig * Editorconfig * Build with make * Build with make * Fix cyclical depts on Windows * attempt to fix build on Windows * attempt to fix build on Windows * Upd TODOs * attempt to fix build on Windows+CUDA * Revert changes in cmake * Fix according to review comments * Support building as a shared library * address review comments --------- Co-authored-by: M. Yusuf Sarıgöz <yusufsarigoz@gmail.com> Co-authored-by: Jared Van Bortel <jared@nomic.ai>
llava_context
instance can serve multiple prompt requests