Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SetDllImportResolver based loading #603

Merged

Conversation

martindevans
Copy link
Member

Using SetDllImportResolver to load llama DLLs as a prototype.

This is just a prototype to check if using this fixes the issue with some DLLs seemingly being loaded twice.

@AsakusaRinne @zsogitbe I'd appreciate if you would test this (particularly with CUDA) and tell me if it seems to help.

@zsogitbe
Copy link
Contributor

I do not see much difference when using this code. It just uses TryLoadLibrary. But the dll is registered with path=null what probably means 'legacy behavior' in SetDllImportResolver.
You know that the parameter you used as 'path' is actually not a path, but an enum where to search for the path?
Maybe set it to path = DllImportSearchPath.UseDllDirectoryForDependencies; before exiting the delegate!?

@martindevans
Copy link
Member Author

martindevans commented Mar 15, 2024

With our normal loading system we lead the DLL and then just hope that the one we loaded is the one that's used. However, since Rinne discovered there were two DLLs loaded that's clearly not working!

The idea here is to replaced the built in loading system with our own. That way it can't possibly load two! It's still using TryLoadLibrary because that does seem to correctly load the correct library, so it's simply using the same logic in a different context.

I'm not sure what you mean about the path stuff, it didn't seem relevant to our use case. Could you explain more?

@zsogitbe
Copy link
Contributor

I was just wondering if you left the path parameter deliberately null or forgot it. If you are sure that this is what you want, then it is OK. I cannot see any difference on my system between this code and the former one.

@zsogitbe
Copy link
Contributor

About something else: there was recently an important update to llama.cpp. The embeddings generation was wrong and it was corrected 2 weeks ago, just 2 days after our binary update. This is one of the reasons why KernelMemory does not work in LLamaSharp when you have more than one document. The system searches for relevant documents based on the embeddings, but they are wrong... Would it be possible to push an early update of the binaries? I know that this is a big ask because some things need to be changed (I have tried the new C++ code, but it does not work with the current LLamaSharp version).

@AsakusaRinne
Copy link
Collaborator

@martindevans I've tested it and it works well!

@martindevans
Copy link
Member Author

@AsakusaRinne thanks for testing! So just to check with this running can you delete all but one of the DLLs, and CUDA works as expected?

If so I'll do another PR with a proper implementation of this and get that merged ASAP.

@AsakusaRinne
Copy link
Collaborator

@AsakusaRinne thanks for testing! So just to check with this running can you delete all but one of the DLLs, and CUDA works as expected?

If so I'll do another PR with a proper implementation of this and get that merged ASAP.

Yes, it also works well in this case! It's a better way than #594

…s replaces the built in loading system and ensures there can't be two libraries loaded at once.

 - llava and llama are loaded separately, as needed.
 - All the previous loading logic is still used, within the `SetDllImportResolver`
 - Split out CUDA, AVX and MacOS paths to separate helper methods.
 - `Description` now specifies if it is for `llama` or `llava`
@martindevans martindevans force-pushed the experimental_SetDllImportResolver branch from b79b298 to f699081 Compare March 17, 2024 02:12
@martindevans martindevans changed the title Experimental SetDllImportResolver SetDllImportResolver based loading Mar 17, 2024
@martindevans martindevans marked this pull request as ready for review March 17, 2024 02:12
@martindevans
Copy link
Member Author

@AsakusaRinne @SignalRT this has been totally rewritten and is now ready for review. I'd appreciate it if you could both test it.

The main change from the prototype to use SetDllImportResolver is still in there of course.

Another change is that I've completely split apart llava and llama loading so they can both be done separately on demand. All of the resolution logic now runs twice, once to select the llama.dll to use, and again (if necessary) to select the llava.dll to use.

Copy link
Collaborator

@AsakusaRinne AsakusaRinne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All things looks good to me and it works as expected on Windows with cuda! I have one concern but it shouldn't block the merging of this PR:

  • Since the OpenCL (cutlass) backend has been added, shall we implement the logic of it in this PR together?

@martindevans
Copy link
Member Author

I've fixed the MacOS loading.

Since the OpenCL (cutlass) backend has been added, shall we implement the logic of it in this PR together?

I'd rather get that done in another PR if that's ok? This one is already getting a bit larger than I wanted.

@AsakusaRinne AsakusaRinne added enhancement New feature or request backend labels Mar 17, 2024
@AsakusaRinne
Copy link
Collaborator

I'd rather get that done in another PR if that's ok? This one is already getting a bit larger than I wanted.

Sure, that shouldn't be a problem. :)

@AsakusaRinne AsakusaRinne added this to the v0.11.0 milestone Mar 17, 2024
@SignalRT
Copy link
Collaborator

@martindevans it's also working for me.

@martindevans martindevans merged commit 0247872 into SciSharp:master Mar 17, 2024
3 checks passed
@martindevans martindevans deleted the experimental_SetDllImportResolver branch March 17, 2024 19:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants