-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch inference #325
Batch inference #325
Conversation
I'm surprised by |
Could be round-off errors. Are you accumulating the sum into a |
Yes it's just for debugging so I accumulate the sum to a double. And the actual issue is, output vectors for identical images batched together are different, the sum is just a proxy to understand where they start to diverge. The first place where the sum of the two-sample batch is not two times the sum of the single-sample batch is just after the first |
Yes, it turned out to be that I calculated a wrong value for the offset to be added to the Now I can confirm the batch inference works correctly. I'll solve conflicts and then mark this ready for review. |
This is ready for review now. Working usage is in monatis/clip.cpp#30 As a future work, I'll implement batch inference for text encoding in clip.cpp, which will require some extra work for attention masking to handle inputs with varying lengths, and probably generalization of CC @ggerganov |
Great! Will come back to reviewing this soon. Need to merge some refactoring PRs first and sync things with |
fixed the conflicts after syncing with llama.cpp yesterday. |
@monatis do you have any benchmarks on performance with and without batch inference by any chance ? |
I'll probably take a look tomorrow and merge if everything is good |
@okpatil4u I updated the benchmark utility to make use of the proposed batch inference in monatis/clip.cpp#30 and got a ~1.4x speedup in per-image inference time compared to the main branch (190 ms down to 136 ms / image over 10k images). And I can see numerous other places we can further improve it soon. This is exactly for it --we will have a playground where we can test and benchmark ideas for batch inference and hopefully improve the performance further.
Sounds great. We have conflicts again, I'll fix them tomorrow. |
@monatis thanks. This is promising. Was the batch size chosen automatically or does user have to define it ? |
@okpatil4u currently the signature of the batch encoding function is: |
@ggerganov conflicts fixed and tests passed. Working usage is in the |
I am implementing batch inference support in monatis/clip.cpp#30. This is still WIP but but almost there. I removed
ne[3] == 1
asserts and generalized Conv2D to N-batchedsrc1
. I also included #224 in this PR for batched matrix multiplication. I believe the only thing to make batchwise is normalization now --I hope to solve it very soon.Hope it might be a playground to test batch inference.