-
Notifications
You must be signed in to change notification settings - Fork 347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SetDllImportResolver
based loading
#603
SetDllImportResolver
based loading
#603
Conversation
I do not see much difference when using this code. It just uses TryLoadLibrary. But the dll is registered with path=null what probably means 'legacy behavior' in SetDllImportResolver. |
With our normal loading system we lead the DLL and then just hope that the one we loaded is the one that's used. However, since Rinne discovered there were two DLLs loaded that's clearly not working! The idea here is to replaced the built in loading system with our own. That way it can't possibly load two! It's still using TryLoadLibrary because that does seem to correctly load the correct library, so it's simply using the same logic in a different context. I'm not sure what you mean about the path stuff, it didn't seem relevant to our use case. Could you explain more? |
I was just wondering if you left the path parameter deliberately null or forgot it. If you are sure that this is what you want, then it is OK. I cannot see any difference on my system between this code and the former one. |
About something else: there was recently an important update to llama.cpp. The embeddings generation was wrong and it was corrected 2 weeks ago, just 2 days after our binary update. This is one of the reasons why KernelMemory does not work in LLamaSharp when you have more than one document. The system searches for relevant documents based on the embeddings, but they are wrong... Would it be possible to push an early update of the binaries? I know that this is a big ask because some things need to be changed (I have tried the new C++ code, but it does not work with the current LLamaSharp version). |
@martindevans I've tested it and it works well! |
@AsakusaRinne thanks for testing! So just to check with this running can you delete all but one of the DLLs, and CUDA works as expected? If so I'll do another PR with a proper implementation of this and get that merged ASAP. |
Yes, it also works well in this case! It's a better way than #594 |
…s replaces the built in loading system and ensures there can't be two libraries loaded at once. - llava and llama are loaded separately, as needed. - All the previous loading logic is still used, within the `SetDllImportResolver` - Split out CUDA, AVX and MacOS paths to separate helper methods. - `Description` now specifies if it is for `llama` or `llava`
b79b298
to
f699081
Compare
SetDllImportResolver
based loading
@AsakusaRinne @SignalRT this has been totally rewritten and is now ready for review. I'd appreciate it if you could both test it. The main change from the prototype to use Another change is that I've completely split apart llava and llama loading so they can both be done separately on demand. All of the resolution logic now runs twice, once to select the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All things looks good to me and it works as expected on Windows with cuda! I have one concern but it shouldn't block the merging of this PR:
- Since the OpenCL (cutlass) backend has been added, shall we implement the logic of it in this PR together?
I've fixed the MacOS loading.
I'd rather get that done in another PR if that's ok? This one is already getting a bit larger than I wanted. |
Sure, that shouldn't be a problem. :) |
@martindevans it's also working for me. |
Using
SetDllImportResolver
to load llama DLLs as a prototype.This is just a prototype to check if using this fixes the issue with some DLLs seemingly being loaded twice.
@AsakusaRinne @zsogitbe I'd appreciate if you would test this (particularly with CUDA) and tell me if it seems to help.