Skip to content

Fix memory allocation for base-patch16 #18

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 25, 2023
Merged

Fix memory allocation for base-patch16 #18

merged 1 commit into from
Jun 25, 2023

Conversation

monatis
Copy link
Owner

@monatis monatis commented Jun 25, 2023

Closes #17

@Green-Sky
Copy link
Collaborator

confirm working

clip_model_load: loading model from '../models/openai_clip-vit-base-patch16/openai_clip-vit-base-patch16.ggmlv0.f16.bin' - please wait...
clip_model_load: text model hparams
n_vocab            49408
num_positions      77
t_hidden_size      512
t_n_intermediate   2048
t_projection_dim   512
t_n_head           8
t_n_layer          12

clip_model_load: vision model hparams
image_size         224
patch_size         16
v_hidden_size      768
v_n_intermediate   3072
v_projection_dim   512
v_n_head           12
v_n_layer          12

use_gelu           0
ftype              1

clip_model_load: ggml ctx size = 287.12 MB
.................................................clip_model_load: model size =   285.77 MB / num tensors = 397
clip_model_load: 16 MB of compute buffer allocated
clip_model_load: model loadded

apple = 0.1522
red = 0.1502
human = 0.1427
dog = 0.1404
photo = 0.1388
blue = 0.1381
drawing = 0.1376

{
size_t mb = 1024 * 1024;
switch (n_tensors)
{
case 397: // base
return 8 * mb;
if (n_image_positions == 50)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why use == here and <= down below.
also why different param name?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We call this function only once in clip_model_load and allocate that memory based the number of positions in the vision model. The scratch buffer, however, is allocated separately in clip_image_encode and clip_text_encode. n_image_positions is fixed for a gien model, but n_positions might be different in the case of the text model. So if I used == there, the condition would not return true for shorter texts.

@monatis monatis merged commit 96bfe62 into main Jun 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug: openai clip-vit-base-patch16 failes with memory error
2 participants