Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add link to Dart binding for llama.cpp #4882

Merged
merged 1 commit into from
Jan 20, 2024

Conversation

netdur
Copy link
Contributor

@netdur netdur commented Jan 11, 2024

  • This is Flutter/Dart binding for llama.h
  • two high level classes for easier usage
  • pre-built binaries for iOS / macos
  • WIP C binding for common.h

@crasm crasm requested review from crasm and removed request for crasm January 12, 2024 06:34
Copy link
Contributor

@crasm crasm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link for convenience: https://github.com/netdur/llama_cpp_dart

Downloading and signing a pre-built binary (for xcode) from a random CI server is a security problem, considering that everything is open source and there's no reason to do that. I think Flutter supports FFI on all platforms except web. I looked at your llama_common_c repo, and I don't see a typical FFI plugin setup with llama.cpp as a submodule in src.

MAID uses this approach for Android. I think it would be possible to use that approach to write a Flutter FFI plugin for llama.cpp that works cross-platform.

However, as far as I can gather, you can't combine server-side dart FFI plugins with Flutter FFI plugins. I've been working on dart bindings for llama.cpp,, and that requires a dev build of dart and the experimental native_assets system.

On another note, I think that having the package be officially released on https://pub.dev in a decent state with basic documentation should be a requirement before we link it in the README.

(There's already a llama_flutter and llama_dart on pub.dev for llama.cpp, but I couldn't tell if it's functional or not. The lack of documentation and updates in 10 months makes me think it's not worth trying to use it.)

@netdur
Copy link
Contributor Author

netdur commented Jan 12, 2024

@crasm, thanks for the insightful review. To clarify, llama_common_c mainly deals with llama.h and provides C bindings for common.h. Its purpose is to simplify the usage of llama.h and common.h, and it's not specifically designed for Flutter or Dart. This repo also offers binaries for macOS, iOS, and the iOS simulator. Plans for Android, Linux, and Windows builds are in the works. Regarding the iOS binary signing, it's indeed done on CI, but additional signing is required as outlined here.

I understand your concern regarding the use of pre-built binaries? It's true that iOS binary is signed on the CI server, but I also offer an alternative for those who prefer building from source. In the llama_common_c repository, there's a script that allows users to build the binaries themselves, using their local copy of llama.cpp. This approach ensures that users have the choice to either use the pre-built binary for convenience or compile the binary themselves for greater transparency and security.

The llama_cpp_dart project is structured as a Dart/Flutter package, not a Flutter plugin. It contains three primary files: llama_cpp.dart generated by ffigen, llm.dart for C bindings I created for the llamma_common_c project, and llama_processor.dart which is specifically for Flutter, employing isolates and streams. This setup allows compatibility with both Dart and Flutter.

Your mention of MAID's approach is awesome, thank you that suggestion. I might consider contributing to MAID instead of managing a separate project. Thanks again for your guidance!

README.md Show resolved Hide resolved
@crasm
Copy link
Contributor

crasm commented Jan 12, 2024

I'm thinking we can wait until you have the other platforms working, or add a caveat that it's only for macOS/iOS for now.

With how you're wrapping common.h, that should make it easier to keep your package up to date with new llama.cpp features. (In my dart bindings, I haven't implemented CFG and had to reimplement much of the logic around sampling.)

I would still prefer to have the build integrated with native_assets_cli instead of needing a separate system. I found sherpa, which uses native_assets in combination with Flutter which may provide some inspiration (though it's not as complete as yours, I believe). Though just getting a dynamic library may be easier to integrate for most devs.

Also pinging @cebtenzzre for approval.

@cebtenzzre
Copy link
Collaborator

cebtenzzre commented Jan 12, 2024

  • WIP C binding for common.h

Er... this isn't supposed to be a public API. It's meant to be used for internal examples and tests only.

@netdur
Copy link
Contributor Author

netdur commented Jan 13, 2024

  • WIP C binding for common.h

Er... this isn't supposed to be a public API. It's meant to be used for internal examples and tests only.

@cebtenzzre, my use of sampling.h, which is part of common, was to facilitate handling long prompts in Dart, a requirement not met by the existing setup due to the presence of several C++ types in llama_sampling_params. These types aren't directly compatible with Dart, hence the need for a binding. If there's an alternate method to enable long prompts without accessing the internal API, I'd be eager to explore that. Any guidance or suggestions would be greatly appreciated.

@crasm absolutely, I hope you will continue exploring native_assets_cli, to increase chances we get something usable for Dart

@cebtenzzre
Copy link
Collaborator

@cebtenzzre, my use of sampling.h, which is part of common, was to facilitate handling long prompts in Dart, a requirement not met by the existing setup due to the presence of several C++ types in llama_sampling_params. These types aren't directly compatible with Dart, hence the need for a binding. If there's an alternate method to enable long prompts without accessing the internal API, I'd be eager to explore that. Any guidance or suggestions would be greatly appreciated.

llama_sampling_params has C++ types because it's part of the examples, which are written in C++. Context length and RoPE scaling are set via llama_context_params, and you can decode as many batches of prompt as you want until you hit that limit.

This API isn't internal to llama.cpp, it's internal to the examples and tests. There's nothing it does that isn't available to downstream users of the public llama.cpp API.

@netdur
Copy link
Contributor Author

netdur commented Jan 16, 2024

I've replaced the C binding with a pure Dart implementation, now solely referencing llama.h. Additionally, I've documented the classes and published the package (note: it's a package, not a plugin) on pub.dev.

@crasm
Copy link
Contributor

crasm commented Jan 17, 2024

@netdur looks good! I'll approve after I give it a try locally

@crasm
Copy link
Contributor

crasm commented Jan 17, 2024

@netdur I was unable to use your package successfully in an example project:

  • Your internal files should be in lib/src
  • You should export the public API and declare library; in lib/llama_cpp_dart.dart etc. (like how I did it here). This way users can import with the conventional syntax package:llama_cpp_dart/llama_cpp_dart.dart.

@netdur
Copy link
Contributor Author

netdur commented Jan 18, 2024

@crasm thanks for the testing! I appreciate it. I'll make sure to update the code as you suggest and test the package locally.

@netdur
Copy link
Contributor Author

netdur commented Jan 19, 2024

@crasm I've updated the package to include your suggestion. Although I've tested loading the package from pub.dev, further testing is still needed. Feel free to check it out now.

@netdur
Copy link
Contributor Author

netdur commented Jan 19, 2024

@crasm it is good to go, please review

Copy link
Contributor

@crasm crasm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good and works on my end.

@crasm crasm merged commit 48e2b13 into ggerganov:master Jan 20, 2024
13 checks passed
@Solido
Copy link

Solido commented Jan 23, 2024

I've been following all implementations you've done (Thanks!) and in the first one I was able to configure P, K and temp. Here the new implementation miss those. Can we expect a mirror of main.h params? @netdur

@netdur
Copy link
Contributor Author

netdur commented Jan 23, 2024

@Solido, Yes because I've shifted from C to a full Dart implementation. I plan to integrate all the suggested features. Could you please open a feature request issue detailing the parameters you need? that way you keep track of implement.

crasm pushed a commit that referenced this pull request Jan 23, 2024
@Solido
Copy link

Solido commented Jan 24, 2024

I would rather follow your roadmap because not all parameters maybe integrated easily.
Seed, top-k, top-p, min-p, temperature should be first as they're so common, then grammar, repeat/penalty, logits parameters, stats, rope (even if you've done some), speculative decoding, cache, lora last. Truly you decide, I flow and use every of your releases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants