Skip to content

Latest commit

 

History

History
102 lines (82 loc) · 6.77 KB

DEVELOPMENT-GUIDE.md

File metadata and controls

102 lines (82 loc) · 6.77 KB

Development Guide

This document is for developers interested in contributing to GraphRAG.

Quickstart

Development is best done in a unix environment (Linux, Mac, or Windows WSL).

  1. Clone the GraphRAG repository.

  2. Follow all directions in the deployment guide to install required tools and deploy an instance of the GraphRAG service in Azure. Alternatively, this repo provides a devcontainer with all tools preinstalled.

  3. Create a .env file in the root of the repository (GraphRAG/.env). A detailed description of environment variables used to configure graphrag can be found here. Add the following environment variables to the .env file:

    Environment Variable Description
    COSMOS_URI_ENDPOINT Azure CosmosDB connection string from graphrag deployment
    STORAGE_ACCOUNT_BLOB_URL Azure Storage blob url from graphrag deployment
    AI_SEARCH_URL AI search endpoint from graphrag deployment (will be in the form of https://<name>.search.windows.net)
    GRAPHRAG_API_BASE The AOAI API Base URL.
    GRAPHRAG_API_VERSION The AOAI API version (i.e. 2023-03-15-preview)
    GRAPHRAG_LLM_MODEL The AOAI model name (i.e. gpt-4)
    GRAPHRAG_LLM_DEPLOYMENT_NAME The AOAI model deployment name (i.e. gpt-4-turbo)
    GRAPHRAG_EMBEDDING_MODEL The AOAI model name (i.e. text-embedding-ada-002)
    GRAPHRAG_EMBEDDING_DEPLOYMENT_NAME The AOAI model deployment name (i.e.my-text-embedding-ada-002)
    REPORTERS A comma-delimited list of logging that will be enabled. Possible values are blob,console,file
  4. Developing inside the devcontainer

    1. Requirements

    2. Open VS Code in the directory containing your project.

      • Use the Command Palette (Ctrl+Shift+P on Windows/Linux, Cmd+Shift+P on macOS) and type "Remote-Containers: Open Folder in Container..."
      • Select your project folder and VS Code will start building the Docker container based on the Dockerfile and devcontainer.json in your project. This process may take a few minutes, especially on the first run.
      • Once your vscode prompt appears, it may not be done. You should wait for the following prompt to appear to ensure full install is complete. vscode@<hostname>:/graphrag$
    3. Adding Python packages to the dev container.

      • Poetry is the Python package manager in the dev container. Python packages can be added using poetry add <package-name>
      • Everytime a package is added it will update poetry.lock and pyproject.toml, these are the two files that track all package management. Changes to these file should be checked into the repo. That is how we keep our devcontainer consistent across users.
      • Its possible to get into a situation where a package has been added but your local poetry.lock does not contain the proper hash. This is most common after resolving a merge conflict and the easiest way to resolve this issue is poetry install, which will check all package status' and update hash values in poetry.lock.
    4. Adding dependencies to the environment

      • Most dependencies (packages or tools) should be added to the environment through the Dockerfile. This allows us to maintain a consistent development enviornment. If you need a tool added, please make the appropriate changes to the Dockerfile and submit a Pull Request.

Deploying GraphRAG

The GraphRAG service consist of two components - a backend application and a frontend UI application (coming soon). GraphRAG can be launched in multiple ways depending on where in the application stack you are developing and debugging.

  • In Azure Kubernetes Service (AKS):

    Navigate to the root directory of the repository. First build and publish the backend docker image to an azure container registry.

    > az acr build --registry <my_container_registry> -f docker/Dockerfile-backend --image graphrag:backend .
    

    Update infra/deployment.parameters.json to use your custom graphrag docker images and re-run the deployment script to update AKS.

    After deployment is complete, kubectl is used to login and view the GraphRAG AKS resources as well aid in other debugging use-cases. See below for some helpful commands to quickly access AKS

    > RGNAME=<your_resource_group>
    > AKSNAME=`az aks list --resource-group $RGNAME --query "[].name" --output tsv`
    > az aks get-credentials -g $RGNAME -n $AKSNAME --overwrite-existing
    > kubectl config set-context --current --namespace=graphrag
    

    Some example AKS commands below to get started

    > kubectl get pods                       # view a list of all deployed pods
    > kubectl get nodes                      # view a list of all deployed nodes
    > kubectl get jobs                       # view a list of all AKS jobs
    > kubectl logs <pod_name>                # print out useful logging information (print statements)
    > kubectl exec -it <pod_name> -- bash    # login to a running container
    > kubectl describe pod <pod_name>        # retrieve detailed info about a pod
    > kubectl describe node <node_name>      # retrieve detailed info about a node
    

Testing

A small collection of pytests have been written to test functionality of the API. To run the tests, add the following envirionment variables to a .env file in the root of the repo directory.

APIM_SUBSCRIPTION_KEY
COSMOS_URI_ENDPOINT
DEPLOYMENT_URL
STORAGE_ACCOUNT_BLOB_URL

The tests assume the solution accelerator has been previously deployed and managed identity has been setup with RBAC access to CosmosDB and Azure Storage. To run the test locally:

# cd to root directory of the repo
> pytest backend/src/tests/test_all_index_endpoint.py -s

Deployment (CI/CD)

This repository uses Github Actions for continuous integration and continious deployment (CI/CD).

Style Guide:

  • We follow PEP 8 standards and naming conventions as close as possible.

  • ruff is used for linting and code formatting. A pre-commit hook has been setup to automatically apply settings to this repo. To make use of this tool without explicitly calling it, install the pre-commit hook.

    > pre-commit install
    

Versioning

We use SemVer for semantic versioning.