My perspective is java and spring ecosystem developers might be feeling left-out in the whole world of AI. This project is an attempt to use spring AI and setup an end to end working of RAG in a local system. What it means we will not be using any paid API or tools. Although it means slower response times but still it will give us a good undertstanding of the different components and their usage.
- Spring AI has inbuilt libraries which do the embedding and generate vectors when adding a document.Those interested can take a look at public void doAdd(List documents) in the QdrantVectoreStore.class file
- ollama can be run natively or via docker. If you are using docker on a mac you might not be able to use the GPU capabilties. Hence i have seen a stark difference in response time when you are using ollama natively with GPU enabled. To enable GPU mode run as "OLLAMA_METAL=1 ollama run model_name" in mac systems
There is a docker-compose file in the follwowing path of project RAG_LLM_JAVA/infra/vector_db_setup
simply run docker compose up -d
Please do not change the contents or the config yaml unless you know what you are doing
Run the script setup_ollama.sh in below location.
RAG_LLM_JAVA/infra/ollama_setup
The script pulls ollama image. waits till it is ready Logs in into ollama and puls the mxbai-embed-large The maximum time it waits for ollama to be ready is 5 minutes after that the script exits. If that happens, please investigate locally
**As mesntioned above this is for running ollama in docker without GPU **
The model that runs with the script is codellama:13b Feel free to modify the model as per your machine capabilities adn requirements
Start the spring boot as as usual from IDE, spring boot cli or maven
place the files which you want to be trained under resources/RAG_data folder On start up it takes care of reading all the files and putting the data in vector store. Currently on text and pdf file parsing is done. _