Add Jlama support (langchain4j#1379)

## Issue Implements langchain4j#1350 ## Change Adds support for Jlama engine ## General checklist  - [X] There are no breaking changes - [X] I have added unit and integration tests for my change - [X] I have manually run all the unit and integration tests in the module I have added/changed, and they are all green - [X] I have manually run all the unit and integration tests in the [core](https://github.com/langchain4j/langchain4j/tree/main/langchain4j-core) and [main](https://github.com/langchain4j/langchain4j/tree/main/langchain4j) modules, and they are all green  - [X] I have added/updated the [documentation](https://github.com/langchain4j/langchain4j/tree/main/docs/docs) - [ ] I have added an example in the [examples repo](https://github.com/langchain4j/langchain4j-examples) (only for "big" features) - [ ] I have added/updated [Spring Boot starter(s)](https://github.com/langchain4j/langchain4j-spring) (if applicable) ## Checklist for adding new model integration  - [X] I have added my new module in the [BOM](https://github.com/langchain4j/langchain4j/blob/main/langchain4j-bom/pom.xml)
Yellow-- · Jul 2, 2024 · ed21f61 · ed21f61
1 parent f601aad
commit ed21f61
Show file tree

Hide file tree

Showing 32 changed files with 1,405 additions and 7 deletions.
diff --git a/docs/docs/integrations/embedding-models/jlama.md b/docs/docs/integrations/embedding-models/jlama.md
@@ -0,0 +1,111 @@
+---
+sidebar_position: 8
+---
+
+# Jlama
+[Jlama Project](https://github.com/tjake/Jlama)
+
+### Project setup
+
+To install langchain4j to your project, add the following dependency:
+
+For Maven project `pom.xml`
+
+```xml
+
+<dependency>
+    <groupId>dev.langchain4j</groupId>
+    <artifactId>langchain4j</artifactId>
+    <version>{your_version}</version>
+</dependency>
+
+<dependency>
+    <groupId>dev.langchain4j</groupId>
+    <artifactId>langchain4j-jlama</artifactId>
+    <version>{your_version}</version>
+</dependency>
+
+<dependency>
+    <groupId>com.github.tjake</groupId>
+    <artifactId>jlama-native</artifactId>
+    <!-- for faster inference. supports linux-x86_64, macos-x86_64/aarch_64, windows-x86_64 
+        Use https://github.com/trustin/os-maven-plugin to detect os and arch -->
+    <classifier>${os.detected.name}-${os.detected.arch}</classifier>
+    <version>${jlama.version}</version> <!-- Version from langchain4j-jlama pom -->
+</dependency>
+```
+
+For Gradle project `build.gradle`
+
+```groovy
+implementation 'dev.langchain4j:langchain4j:0.31.0'
+implementation 'dev.langchain4j:langchain4j-jlama:0.31.0'
+```
+
+## Embedding
+The Jlama Embeddings model allows you to embed sentences, and using it in your application is simple. 
+We provide a simple example to get you started with Jlama Embeddings model integration.
+You can use any `bert` based model from [HuggingFace](https://huggingface.co/models?library=safetensors&sort=trending), and specify them using the `owner/model-name` format.
+
+Create a class and add the following code.
+
+```java
+import dev.langchain4j.data.embedding.Embedding;
+import dev.langchain4j.data.segment.TextSegment;
+import dev.langchain4j.model.jlama.JlamaEmbeddingModel;
+import dev.langchain4j.model.embedding.EmbeddingModel;
+import dev.langchain4j.store.embedding.EmbeddingMatch;
+import dev.langchain4j.store.embedding.EmbeddingStore;
+import dev.langchain4j.store.embedding.inmemory.InMemoryEmbeddingStore;
+
+import java.util.List;
+
+public class HelloWorld {
+    public static void main(String[] args) {
+        EmbeddingModel embeddingModel = JlamaEmbeddingModel
+                                        .modelName("intfloat/e5-small-v2")
+                                        .build();
+
+        // For simplicity, this example uses an in-memory store, but you can choose any external compatible store for production environments.
+        EmbeddingStore<TextSegment> embeddingStore = new InMemoryEmbeddingStore<>();
+
+        TextSegment segment1 = TextSegment.from("I like football.");
+        Embedding embedding1 = embeddingModel.embed(segment1).content();
+        embeddingStore.add(embedding1, segment1);
+
+        TextSegment segment2 = TextSegment.from("The weather is good today.");
+        Embedding embedding2 = embeddingModel.embed(segment2).content();
+        embeddingStore.add(embedding2, segment2);
+
+        String userQuery = "What is your favourite sport?";
+        Embedding queryEmbedding = embeddingModel.embed(userQuery).content();
+        int maxResults = 1;
+        List<EmbeddingMatch<TextSegment>> relevant = embeddingStore.findRelevant(queryEmbedding, maxResults);
+        EmbeddingMatch<TextSegment> embeddingMatch = relevant.get(0);
+
+        System.out.println("Question: " + userQuery); // What is your favourite sport?
+        System.out.println("Response: " + embeddingMatch.embedded().text()); // I like football.
+    }
+}
+```
+For this example, we'll add 2 text segments, but LangChain4j offers built-in support for loading documents from various sources:
+File System, URL, Amazon S3, Azure Blob Storage, GitHub, Tencent COS.
+Additionally, LangChain4j supports parsing multiple document types:
+text, pdf, doc, xls, ppt.
+
+The output will be similar to this:
+
+```plaintext
+Question: What is your favourite sport?
+Response: I like football.
+```
+
+Of course, you can combine Jlama Embeddings with RAG (Retrieval-Augmented Generation) techniques.
+
+In [RAG](/tutorials/rag) you will learn how to use RAG techniques for ingestion, retrieval and Advanced Retrieval with LangChain4j.
+
+A lot of parameters are set behind the scenes, such as timeout, model type and model parameters.
+In [Set Model Parameters](/tutorials/model-parameters) you will learn how to set these parameters explicitly.
+
+### More examples
+If you want to check more examples, you can find them in the [langchain4j-examples](https://github.com/langchain4j/langchain4j-examples) project.
diff --git a/docs/docs/integrations/index.mdx b/docs/docs/integrations/index.mdx
@@ -31,6 +31,7 @@ of course some LLM providers offer large multimodal model (accepting text or ima
 | [Google Vertex AI Gemini](/integrations/language-models/google-gemini) |  | ✅ | ✅ | | ✅ | | ✅ |
 | [Google Vertex AI](/integrations/language-models/google-palm)          | ✅ | ✅ | | ✅ | ✅ | | |
 | [Mistral AI](/integrations/language-models/mistral-ai)                  |  | ✅ | ✅ | ✅ |  | |✅ |
+| [Jlama](/integrations/language-models/jlama)                   |  | ✅ | ✅ | ✅ |  | |  |
 | [DashScope](/integrations/language-models/dashscope)                   |  | ✅ | ✅ | ✅ |  | | ✅ |
 | [LocalAI](/integrations/language-models/local-ai)                       |  | ✅ | ✅ | ✅ |  | | ✅ |
 | [Ollama](/integrations/language-models/ollama)                         |  | ✅ | ✅ | ✅ |  | | |

diff --git a/docs/docs/integrations/language-models/index.md b/docs/docs/integrations/language-models/index.md
@@ -14,6 +14,7 @@ sidebar_position: 0
 | [Google Vertex AI Gemini](/integrations/language-models/google-gemini) | ✅                                          | ✅                         | ✅            |                                                         |        |
 | [Google Vertex AI PaLM 2](/integrations/language-models/google-palm)   |                                            |                           |              |                                                         | ✅      |
 | [Hugging Face](/integrations/language-models/hugging-face)             |                                            |                           |              |                                                         |        |
+| [Jlama](/integrations/language-models/jlama)                           | ✅                                          |                           |              | ✅                                                       | ✅      |
 | [LocalAI](/integrations/language-models/local-ai)                      | ✅                                          | ✅                         |              | ✅                                                       |        |
 | [Mistral AI](/integrations/language-models/mistral-ai)                 | ✅                                          | ✅                         |              |                                                         |        |
 | [Ollama](/integrations/language-models/ollama)                         | ✅                                          |                           | ✅            | ✅                                                       |        |

diff --git a/docs/docs/integrations/language-models/jlama.md b/docs/docs/integrations/language-models/jlama.md
@@ -0,0 +1,150 @@
+---
+sidebar_position: 9
+---
+
+# Jlama
+[Jlama Project](https://github.com/tjake/Jlama)
+
+## Project setup
+
+To install langchain4j to your project, add the following dependency:
+
+For Maven project `pom.xml`
+
+```xml
+
+<dependency>
+    <groupId>dev.langchain4j</groupId>
+    <artifactId>langchain4j</artifactId>
+    <version>{your-version}</version> <!-- Specify your version here -->
+</dependency>
+
+<dependency>
+    <groupId>dev.langchain4j</groupId>
+    <artifactId>langchain4j-jlama</artifactId>
+    <version>{your-version}</version>
+</dependency>
+
+<dependency>
+    <groupId>com.github.tjake</groupId>
+    <artifactId>jlama-native</artifactId>
+    <!-- for faster inference. supports linux-x86_64, macos-x86_64/aarch_64, windows-x86_64 
+       Use https://github.com/trustin/os-maven-plugin to detect os and arch -->
+    <classifier>${os.detected.name}-${os.detected.arch}</classifier>
+    <version>${jlama.version}</version> <!-- Version from langchain4j-jlama pom -->
+</dependency>
+
+```
+
+For Gradle project `build.gradle`
+
+```groovy
+implementation 'dev.langchain4j:langchain4j:{your-version}'
+implementation 'dev.langchain4j:langchain4j-jlama:{your-version}'
+```
+
+### Model Selection
+You can use most safetensor models on [HuggingFace](https://huggingface.co/models?library=safetensors&sort=trending) and specify them using the `owner/model-name` format.
+Jlama maintains a list of pre-quantized popular models under http://huggingface.co/tjake
+
+Models that use the following architecture are supported:
+- Gemma Models
+- Llama Models
+- Mistral Models
+- Mixtral Models
+- GPT-2 Models
+- BERT Models
+
+## Chat Completion
+The chat models allow you to generate human-like responses with a model fined-tuned on conversational data.
+
+### Synchronous
+Create a class and add the following code.
+
+```java
+import dev.langchain4j.model.chat.ChatLanguageModel;
+import dev.langchain4j.model.jlama.JlamaChatModel;
+
+public class HelloWorld {
+    public static void main(String[] args) {
+        ChatLanguageModel model = JlamaChatLanguageModel.builder()
+                .modelName("tjake/TinyLlama-1.1B-Chat-v1.0-Jlama-Q4")
+                .build();
+
+        String response = model.generate("Say 'Hello World'");
+        System.out.println(response);
+    }
+}
+```
+Running the program will generate a variant of the following output
+
+```plaintext
+Hello World! How can I assist you today?
+```
+
+### Streaming
+Create a class and add the following code.
+
+```java
+import dev.langchain4j.data.message.AiMessage;
+import dev.langchain4j.model.StreamingResponseHandler;
+import dev.langchain4j.model.jlama.JlamaStreamingChatModel;
+import dev.langchain4j.model.output.Response;
+
+import java.util.concurrent.CompletableFuture;
+
+public class HelloWorld {
+    public static void main(String[] args) {
+        StreamingChatLanguageModel model = JlamaStreamingChatLanguageModel.builder()
+                .modelName("tjake/TinyLlama-1.1B-Chat-v1.0-Jlama-Q4")
+                .build();
+
+        CompletableFuture<Response<AiMessage>> futureResponse = new CompletableFuture<>();         
+        model.generate("Tell me a joke about Java", new StreamingResponseHandler() {
+            @Override
+            public void onNext(String token) {
+                System.out.print(token);
+            }
+
+            @Override
+            public void onComplete(Response<AiMessage> response) {
+                futureResponse.complete(response);
+            }
+
+            @Override
+            public void onError(Throwable error) {
+                futureResponse.completeExceptionally(error);
+            }    
+        });
+
+        futureResponse.join();
+    }
+}
+```
+You will receive each chunk of text (token) as it is generated by the LLM on the `onNext` method.
+
+You can see that output below is streamed in real-time.
+
+```plaintext
+"Why do Java developers wear glasses? Because they can't C#"
+```
+
+Of course, you can combine Jlama chat completion with other features like [Set Model Parameters](/tutorials/model-parameters) and [Chat Memory](/tutorials/chat-memory) to get more accurate responses.
+
+In [Chat Memory](/tutorials/chat-memory) you will learn how to pass along your chat history, so the LLM knows what has been said before. If you don't pass the chat history, like in this simple example, the LLM will not know what has been said before, so it won't be able to correctly answer the second question ('What did I just ask?').
+
+A lot of parameters are set behind the scenes, such as timeout, model type and model parameters.
+In [Set Model Parameters](/tutorials/model-parameters) you will learn how to set these parameters explicitly.
+
+
+Jlama has some special model parameters that you can set 
+
+ - `modelCachePath` parameter, which allows you to specify a path to a directory where the model will be cached once downloaded. Default is `~/.jlama`.
+ - `workingDirectory` parameter, which allows you to keep a persistent ChatMemory on disk for a given model instance. This is faster than using Chat Memory.
+ - `quantizeModelAtRuntime` parameter, which will quantize the model at runtime. The current quantization is always Q4. You can also pre-quantize the model using jlama project tools (See [Jlama Project](https://github.com/tjake/jlama) for more information).
+
+### Function Calling
+Jlama does not support function calling (yet).
+
+### JSON mode
+Jlama does not support JSON mode (yet). But you can always ask the model nicely to return JSON.
diff --git a/docs/docs/integrations/language-models/local-ai.md b/docs/docs/integrations/language-models/local-ai.md
@@ -1,5 +1,5 @@
 ---
-sidebar_position: 9
+sidebar_position: 10
 ---
 
 # LocalAI

diff --git a/docs/docs/integrations/language-models/mistral-ai.md b/docs/docs/integrations/language-models/mistral-ai.md
@@ -1,5 +1,5 @@
 ---
-sidebar_position: 10
+sidebar_position: 11
 ---
 
 # MistralAI

diff --git a/docs/docs/integrations/language-models/ollama.md b/docs/docs/integrations/language-models/ollama.md
@@ -1,5 +1,5 @@
 ---
-sidebar_position: 11
+sidebar_position: 12
 ---
 
 # Ollama

diff --git a/docs/docs/integrations/language-models/open-ai.md b/docs/docs/integrations/language-models/open-ai.md
@@ -1,5 +1,5 @@
 ---
-sidebar_position: 12
+sidebar_position: 13
 ---
 
 # OpenAI

diff --git a/docs/docs/integrations/language-models/qianfan.md b/docs/docs/integrations/language-models/qianfan.md
@@ -1,5 +1,5 @@
 ---
-sidebar_position: 13
+sidebar_position: 14
 ---
 
 # Qianfan

diff --git a/docs/docs/integrations/language-models/workers-ai.md b/docs/docs/integrations/language-models/workers-ai.md
@@ -1,5 +1,5 @@
 ---
-sidebar_position: 14
+sidebar_position: 15
 ---
 
 # Cloudflare Workers AI

diff --git a/docs/docs/integrations/language-models/zhipu-ai.md b/docs/docs/integrations/language-models/zhipu-ai.md
@@ -1,5 +1,5 @@
 ---
-sidebar_position: 15
+sidebar_position: 16
 ---
 
 # ZhipuAI

diff --git a/langchain4j-bom/pom.xml b/langchain4j-bom/pom.xml
@@ -81,6 +81,12 @@
                 <version>${project.version}</version>
             </dependency>
 
+            <dependency>
+                <groupId>dev.langchain4j</groupId>
+                <artifactId>langchain4j-jlama</artifactId>
+                <version>${project.version}</version>
+            </dependency>
+
             <dependency>
                 <groupId>dev.langchain4j</groupId>
                 <artifactId>langchain4j-jina</artifactId>

diff --git a/langchain4j-jlama/README.md b/langchain4j-jlama/README.md
@@ -0,0 +1,12 @@
+### Jlama integration for langchain4j
+
+[Jlama](https://github.com/tjake/Jlama) is a Java library that provides a simple way to integrate LLM models into Java
+applications.
+
+Jlama is built with Java 21 and utilizes the new [Vector API](https://openjdk.org/jeps/448) for faster inference.
+
+Jlama uses huggingface models in safetensor format.
+Models must be specified using the `owner/model-name` format. For example, `meta-llama/Llama-2-7b-chat-hf`.
+
+Pre-quantized models are maintained under https://huggingface.co/tjake
+