Jlama is a Java library that provides a simple way to integrate LLM models into Java applications.
Jlama is built with Java 20+ and utilizes the new Vector API for faster inference.
Jlama uses huggingface models in safetensor format.
Models must be specified using the owner/model-name
format. For example, meta-llama/Llama-2-7b-chat-hf
.
Pre-quantized models are maintained under https://huggingface.co/tjake