Skip to content

Azure-Samples/azure-sql-db-rag-langchain-chainlit

Repository files navigation

Azure SQL DB, Langchain, LangGraph and Chainlit

This project contains a sample Chainlit (Python) application that uses the Retrieval-Augmented Generation (RAG) pattern against data stored in Azure SQL DB. The application uses LangChain and Chainlit as demonstrated in the #RAGHack conference. Full details and video recording available here: RAG on Azure SQL Server.

The sample is build using plain LangChain (app.py) or using LangGraph (app-langgraph.py) to define the RAG process.

Architecture

Architecture

Solution

The solution works locally and in Azure. It is composed of three main Azure components:

  • Azure SQL Database: The Azure SQL database that stores application data.
  • Azure Open AI: Hosts the language models for generating embeddings and completions.
  • Azure Functions: The serverless function to automate the process of generating the embeddings (this is optional for this sample)

Azure Open AI

Before getting started, make sure to have two models deployed, one for generating embeddings (text-embedding-ada-002 model recommended) and one for handling the chat (gpt-4o recommended). You can use the Azure OpenAI service to deploy the models. Make sure to have the endpoint and the API key ready. The two models are assumed to be deployed with the following names:

  • Embedding model: text-embedding-ada-002
  • Chat model: gpt-4o

TODO: Add links to pages where regions supporting the above models can be identified.

TODO: Add short section on using Azure AI Foundry to create/verify model deployments.

Database

Note

Vector Functions are in Public Preview. Learn the details about vectors in Azure SQL here: https://aka.ms/azure-sql-vector-public-preview

To deploy the database, you can either use the provided .NET 8 Core console application or do it manually.

To use the .NET 8 Core console application, change directories into the /database and then make sure to create a .env file in the /database folder starting from the .env.example file:

  • MSSQL: the connection string to the Azure SQL database where you want to deploy the database objects and sample data
  • OPENAI_URL: specify the URL of your Azure OpenAI endpoint, eg: 'https://my-open-ai.openai.azure.com/'
  • OPENAI_KEY: specify the API key of your Azure OpenAI endpoint
  • OPENAI_MODEL: specify the deployment name of your Azure OpenAI embedding endpoint, eg: 'text-embedding-3-small'

To run the .NET 8 Core console application:

  1. Open a new terminal windows in VS Code.

  2. AzLogin

    az login
  3. At the terminal prompt, change directories to the /database folder:

    cd database
  4. Build the database project:

    dotnet build
  5. Run the database project:

    dotnet run

If you prefer to deploy the database manually, make sure to execute the scripts in the /database/sql folder in the order specifed by the number in the file name. Some files (020-security.sql and 060-get_embedding.sql) have placeholders that you must replace with your own values:

  • $OPENAI_URL$: replace with the URL of your Azure OpenAI endpoint, eg: 'https://my-open-ai.openai.azure.com/'
  • $OPENAI_KEY$: replace with the API key of your Azure OpenAI endpoint
  • $OPENAI_MODEL$: replace with the deployment name of your Azure OpenAI embedding model, eg: 'text-embedding-ada-002'

Chainlit

TODO: Update instructions for anyone using codespaces, as the venv will have already been created and the requirements.txt file will have been installed in the venv. (will still need to activate the .venv before running chainlit command...)

Chainlit solution is in chainlit folder. Move into the folder, create a virtual environment and install the requirements:

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

or, on Windows:

python -m venv .venv
.venv\Script\activate
pip install -r requirements.txt

Then ensure you create an .env file in the /chainlit folder starting from the .env.example file and it with the values for your environment.

TODO: Need to talk about how to put together the correct connection string when not using SQL auth. Using Entra ID, they will not need the Uid or Password parameters...

AZURE_SQL_CONNECTION_STRING='Driver={ODBC Driver 18 for SQL Server};Server=tcp:[YOUR_SQL_SERVER_NAME].database.windows.net,1433;Database=[YOUR_DATABASE_NAME];Encrypt=yes;Connection Timeout=30;'

Then, run the chainlit solution:

chainlit run app.py

or

chainlit run app-langgraph.py

if you want to use the LangGraph solution.

Once the application is running, you'll be able to ask question about your data and get the answer from the Azure OpenAI model. For example you can ask question on the data you have in the database:

Is there any session on Retrieval Augmented Generation?

You'll see that Langchain will call the function get_similar_sessions that behind the scenes connects to the database and excute the stored procedure web.find_sessions which perform vector search on database data.

The RAG process is defined using Langchain's LCEL Langchain Expression Language that can be easily extended to include more complex logic, even including complex agent actions with the aid of LangGraph, where the function calling the stored procedure will be a tool available to the agent.

Azure Functions (optional)

In order to automate the process of generating the embeddings, you can use the Azure Functions. Thanks to Azure SQL Trigger Binding, it is possible to have tables monitored for changes and then react to those changes by executing some code in the Azure Function itself. As a result, it is possible to automate the process of generating the embeddings and storing them in the database.

In a perfect microservices architecture, the Azure Functions are written in C#, but you can easily create the same solution using Python, Node.js or any other supported language.

The Azure Functions solution is in the azure-functions folder. Move into the folder, then create a local.settings.json starting from the provided local.settings.json.example file and fill it with your own values. Then run the Azure Functions locally (make sure to have the Azure Function core tools installed):

func start

The Azure Function will monitor the configured tables for changes and automatically call the Azure OpenAI endpoint to generate the embeddings for the new or updated data.

About

Sample RAG pattern using Azure SQL DB, Langchain and Chainlit

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published