Skip to content

Latest commit

 

History

History
61 lines (45 loc) · 2.93 KB

local.mdx

File metadata and controls

61 lines (45 loc) · 2.93 KB
title description
Running locally
Use local tools so that your code never leaves your machine

import InstallSage from '/snippets/install-sage.mdx';

Install Ollama

To run any open-source LLM directly on your machine, use Ollama:

  1. Go to ollama.com and download the appropriate binary for your machine.
  2. Open a new terminal window.
  3. Pull the desired model, for instance: ollama pull llama3.1.

Optional: Index the codebase with Marqo

If your codebase is small (less than 100 files), you can skip this section.

For larger codebases, we cannot afford to use the default LLM-based retrieval. Instead, we need to use vector-based retrieval, which requires chunking, embedding, and storing the codebase in a vector store. To achieve this locally, we use Marqo.

Install Marqo

First, make sure you have Docker installed. Then run:

docker rm -f marqo
docker pull marqoai/marqo:latest
docker run --name marqo -it -p 8882:8882 marqoai/marqo:latest

This will open a persistent Marqo console window. It should take 2-3 minutes on a fresh install.

Index the codebase

To chunk, embed and store your codebase in the Marqo vector store, run the following command (replacing huggingface/transformers with your desired GitHub repository):

sage-index huggingface/transformers --mode=local

Chat with any codebase locally

You are now ready to chat with your codebase. Run the following command, making sure to replace huggingface/transformers with your desired GitHub repository:

sage-chat huggingface/transformers --mode=local

You should now have a Gradio app running on localhost. Happy (local) chatting!

Customizations

You can customize your chat experience by passing any of the following flags to the sage-chat command:

You can select any LLM from the [Ollama library](https://ollama.com/library). By default, we use `llama3.1`. The embedding model is responsible for converting text to dense vectors. You can select any embedding model from Hugging Face (the [MTEB leaderboard](https://huggingface.co/spaces/mteb/leaderboard) might be helpful). By default, we use `hf/e5-base-v2`. After retrieving the top 25 relevant files from the vector database, we use a reranker to narrow it down to top 5. You can choose any reranker from Hugging Face (check out the "Reranking" tab under the [MTEB leaderboard](https://huggingface.co/spaces/mteb/leaderboard)). By default, we use the [cross-encoder/ms-marco-MiniLM-L-6-v2](https://huggingface.co/cross-encoder/ms-marco-MiniLM-L-6-v2).