Skip to content

karstenheld3/SharePoint-GPT-Middleware

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SharePoint-GPT-Middleware

A FastAPI-based middleware application that bridges SharePoint content with OpenAI's AI capabilities. It crawls SharePoint sites (document libraries, lists, and site pages), processes the content, and manages OpenAI vector stores for intelligent search and retrieval.

Features

  • SharePoint Crawler: Automated crawling of SharePoint document libraries, lists, and site pages
  • OpenAI Proxy: Full proxy for OpenAI APIs (Files, Vector Stores, Responses)
  • Semantic Search: AI-powered search across SharePoint content using OpenAI's file search
  • Knowledge Domains: Combine documents, lists, site pages from any SharePoint site or location into unified "domains" based on SharePoint filters
  • Inventory Management: Track and manage vector stores, files, and assistants
  • Flexible Authentication: Support for Azure OpenAI (key, managed identity, service principal) and OpenAI API
  • Persistent Storage: Organized file structure for crawled content and metadata (see PERSISTENT_STORAGE_STRUCTURE.md)

Documentation

Architecture

Application Structure

src/
├── app.py                          # FastAPI application entry point
├── hardcoded_config.py             # Configuration constants
├── routers_v1/common_openai_functions_v1.py  # OpenAI client utilities (in routers_v1/)
├── common_crawler_functions.py     # SharePoint crawler utilities
├── routers_v1/common_sharepoint_functions_v1.py  # SharePoint access utilities (in routers_v1/)
├── common_utility_functions.py                 # Helper functions
├── routers_static/                 # Static routers (no version prefix)
│   ├── openai_proxy.py             # OpenAI API proxy endpoints
│   └── sharepoint_search.py        # AI-powered search endpoints
└── routers_v1/                     # V1 routers (mounted at /v1)
    ├── crawler.py                  # SharePoint crawler endpoints
    ├── domains.py                  # Domain management endpoints
    ├── inventory.py                # Vector store inventory endpoints
    ├── common_ui_functions_v1.py   # Shared UI generation functions
    └── common_job_functions_v1.py  # Shared job management functions

Storage Structure

The application uses a persistent storage system to organize domains, crawled content, and logs. For detailed information about the storage structure, see PERSISTENT_STORAGE_STRUCTURE.md.

PERSISTENT_STORAGE_PATH/
├── domains/          # Domain configurations and metadata
├── crawler/          # Crawled SharePoint content
└── logs/            # Application logs

API Endpoints

OpenAI Proxy (/openai)

Full proxy for OpenAI APIs with support for both Azure OpenAI and OpenAI services:

API Method Path Description
Responses POST /openai/responses Create response
Responses GET /openai/responses List responses
Responses GET /openai/responses/{response_id} Get response
Responses DELETE /openai/responses/{response_id} Delete response
Files POST /openai/files Upload file
Files GET /openai/files List files
Files GET /openai/files/{file_id} Get file
Files DELETE /openai/files/{file_id} Delete file
Files GET /openai/files/{file_id}/content Get file content
Vector Stores POST /openai/vector_stores Create vector store
Vector Stores GET /openai/vector_stores List vector stores
Vector Stores GET /openai/vector_stores/{vector_store_id} Get vector store
Vector Stores POST /openai/vector_stores/{vector_store_id} Update vector store
Vector Stores DELETE /openai/vector_stores/{vector_store_id} Delete vector store
Vector Store Files POST /openai/vector_stores/{vector_store_id}/files Create vector store file
Vector Store Files GET /openai/vector_stores/{vector_store_id}/files List vector store files
Vector Store Files GET /openai/vector_stores/{vector_store_id}/files/{file_id} Get vector store file
Vector Store Files DELETE /openai/vector_stores/{vector_store_id}/files/{file_id} Delete vector store file

SharePoint Search (/)

AI-powered search across SharePoint content:

Endpoint Method Description
/query POST Execute search query (JSON)
/query2 GET/POST Execute search query (HTML/JSON)
/describe GET Get search configuration
/describe2 GET Get search configuration (HTML/JSON)

Domain Management (/v1/domains)

Manage SharePoint domains and their vector stores:

Endpoint Method Description
/v1/domains GET List all domains (HTML/JSON/UI)
/v1/domains/create GET/POST Create new domain
/v1/domains/update GET/PUT Update domain configuration
/v1/domains/delete DELETE Delete domain

Crawler (/v1/crawler)

SharePoint content crawling and synchronization:

Endpoint Method Description
/v1/crawler GET Crawler UI and documentation
/v1/crawler/localstorage GET Local storage inventory (HTML/JSON/ZIP)
/v1/crawler/list_sharepoint_files GET List files from SharePoint source
/v1/crawler/list_local_files GET List local embedded files
/v1/crawler/list_vectorstore_files GET List files in domain vector store
/v1/crawler/download_files GET Download files from SharePoint
/v1/crawler/update_vector_store GET Update vector store with local files
/v1/crawler/replicate_to_global GET Replicate domain stores to global
/v1/crawler/migrate_from_v2_to_v3 GET Migrate metadata format

Inventory (/v1/inventory)

Manage and inspect OpenAI resources:

Endpoint Method Description
/v1/inventory GET Inventory documentation
/v1/inventory/vectorstores GET List vector stores (HTML/JSON/UI)
/v1/inventory/vectorstores/delete DELETE Delete vector store
/v1/inventory/vectorstore_files GET List files in vector store
/v1/inventory/vectorstore_files/remove DELETE Remove file from vector store
/v1/inventory/vectorstore_files/delete DELETE Delete file from store and storage
/v1/inventory/files GET List all files (HTML/JSON/UI)
/v1/inventory/files/delete DELETE Delete file from storage
/v1/inventory/assistants GET List assistants (HTML/JSON/UI)
/v1/inventory/assistants/delete DELETE Delete assistant

Health & Status

Endpoint Method Description
/ GET Application home page with links
/alive GET Health check endpoint
/openaiproxyselftest GET Run OpenAI proxy self-test

Configuration

The application is configured via environment variables. Copy env-file-template.txt to .env and configure:

OpenAI Service

  • OPENAI_SERVICE_TYPE: openai or azure_openai
  • Azure OpenAI:
    • AZURE_OPENAI_ENDPOINT: Azure OpenAI endpoint URL
    • AZURE_OPENAI_API_KEY: API key (if using key authentication)
    • AZURE_OPENAI_API_VERSION: API version (default: 2025-04-01-preview)
    • AZURE_OPENAI_DEFAULT_MODEL_DEPLOYMENT_NAME: Model deployment name
    • Authentication options:
      • Key: AZURE_OPENAI_USE_KEY_AUTHENTICATION=true
      • Service Principal: AZURE_TENANT_ID, AZURE_CLIENT_ID, AZURE_CLIENT_SECRET
      • Managed Identity: AZURE_OPENAI_USE_MANAGED_IDENTITY=true, AZURE_MANAGED_IDENTITY_CLIENT_ID
  • OpenAI API:
    • OPENAI_API_KEY: OpenAI API key
    • OPENAI_ORGANIZATION: Organization ID
    • OPENAI_DEFAULT_MODEL_NAME: Model name (default: gpt-4o-mini)

SharePoint Crawler

  • CRAWLER_CLIENT_ID: Azure AD app registration client ID
  • CRAWLER_CLIENT_CERTIFICATE_PFX_FILE: Certificate file for authentication
  • CRAWLER_CLIENT_CERTIFICATE_PASSWORD: Certificate password
  • CRAWLER_TENANT_ID: Azure AD tenant ID

Search Configuration

  • SEARCH_DEFAULT_GLOBAL_VECTOR_STORE_ID: Default vector store for search
  • SEARCH_DEFAULT_MAX_NUM_RESULTS: Maximum search results (default: 20)
  • SEARCH_DEFAULT_TEMPERATURE: AI temperature (default: 0.0)
  • SEARCH_DEFAULT_INSTRUCTIONS: Default search instructions
  • SEARCH_DEFAULT_SHAREPOINT_ROOT_URL: SharePoint root URL

Storage

  • LOCAL_PERSISTENT_STORAGE_PATH: Local storage path (for local development)
  • LOG_QUERIES_AND_RESPONSES: Enable detailed logging (default: false)

For complete configuration details, see env-file-template.txt.

Setup and Deployment

Run Locally

Prerequisites:

  • Python 3.12
  • PowerShell 7+ (Windows)
  • Azure AD app registration with SharePoint permissions (for crawler)
  • OpenAI API key or Azure OpenAI service

Steps:

  1. Create and populate .env

    • Copy env-file-template.txt to .env at the repo root and fill in your values.
  2. Install dependencies (choose one)

    Option A: Use helper batch file (recommended on Windows)

    .\InstallAndCompileDependencies.bat

    This creates .venv, installs uv, installs dependencies from src/pyproject.toml (editable), and generates requirements.txt.

    Option B: Manual installation

    python -m venv .venv
    .\.venv\Scripts\Activate.ps1
    pip install -U pip
    pip install -r requirements.txt
  3. Run the API (FastAPI/Uvicorn)

    From repo root:

    python -m uvicorn app:app --app-dir src --host 0.0.0.0 --port 8000 --reload

    Access the application:

  4. Run from VS Code (alternative)

    With debugger:

    • Press F5
    • OR: Run and Debug panel → select Python: FastAPI (Uvicorn) → Start
    • Browser opens automatically to /

    Without debugger:

    • Terminal → Run Task → Run API (Uvicorn)
    • Optionally run the Open Browser task

Deploy to Azure Web App

Prerequisites:

  • Azure subscription and resource group
  • Azure CLI and Az PowerShell module (scripts install if missing)
  • .env file configured with Azure settings

Steps:

  1. Configure environment

    • Ensure .env contains Azure deployment settings:
      • AZURE_SUBSCRIPTION_ID
      • AZURE_TENANT_ID
      • AZURE_RESOURCE_GROUP
      • AZURE_APP_SERVICE_NAME
      • AZURE_LOCATION (default: swedencentral)
  2. Provision and deploy

    Option A: Use helper batch files (recommended)

    Create Azure resources (if not already created):

    .\CreateAzureAppService.bat

    Deploy current code to the Web App:

    .\DeployAzureAppService.bat

    (Optional) Delete the Web App and plan:

    .\DeleteAzureAppService.bat

    Option B: Run PowerShell script directly

    .\DeployAzureAppService.ps1

What the deployment script does:

  • Reads .env and sets app settings (excluding deployment-only keys)
  • Sets startup command: python -m uvicorn app:app --host 0.0.0.0 --port 8000
  • Sets WEBSITES_PORT=8000 for proper traffic routing
  • Packages src/ plus root requirements.txt into deploy.zip
  • Zip-deploys to Azure Web App (Oryx builds on server)

Verify deployment:

  • Health check: https://<APP_NAME>.azurewebsites.net/alive
  • Application: https://<APP_NAME>.azurewebsites.net/
  • API docs: https://<APP_NAME>.azurewebsites.net/docs

View logs:

  • Portal: https://<APP_NAME>.scm.azurewebsites.net/api/logs/docker
  • CLI: az webapp log tail --name <APP_NAME> --resource-group <RESOURCE_GROUP>

Troubleshooting:

  • Check Docker logs for Python import errors or missing packages
  • Verify .env values are correct and complete
  • Ensure Azure AD app has proper SharePoint permissions (for crawler)
  • If changing port, update both startup command and WEBSITES_PORT
  • Check persistent storage path is writable (Azure: /home/data)

SharePoint Crawler Setup

To enable SharePoint crawling, you need to:

  1. Create Azure AD App Registration

    • Register an app in Azure AD
    • Add SharePoint API permissions: Sites.Read.All
    • Create a certificate for authentication
  2. Configure Certificate

    • Use CreateSelfSignedCertificate.ps1 to generate a certificate
    • Upload certificate to Azure AD app registration
    • Place .pfx file in repo root
    • Set CRAWLER_CLIENT_CERTIFICATE_PFX_FILE and CRAWLER_CLIENT_CERTIFICATE_PASSWORD in .env
  3. Grant SharePoint Permissions

    • Use AddRemoveCrawlerSharePointSites.ps1 to grant site collection permissions
    • Use AddRemoveCrawlerPermissions.ps1 to manage app permissions
  4. Create Domain Configuration

    • Use /domains/create endpoint to create domain configurations
    • Configure SharePoint sources (document libraries, lists, site pages)
    • Each domain gets its own vector store

For detailed storage structure, see PERSISTENT_STORAGE_STRUCTURE.md.

Usage Examples

Search SharePoint Content

curl -X POST "http://localhost:8000/query" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Find documents about project planning",
    "max_num_results": 10
  }'

Create a Domain

curl -X POST "http://localhost:8000/v1/domains/create" \
  -H "Content-Type: application/x-www-form-urlencoded" \
  -d "domain_id=MYSITE&name=My SharePoint Site&description=Project documentation site&vector_store_name=SharePoint-MYSITE"

Download Files and Update Vector Store

# Download files from SharePoint
curl "http://localhost:8000/v1/crawler/download_files?domain_id=MYSITE&format=json"

# Update vector store with downloaded files
curl "http://localhost:8000/v1/crawler/update_vector_store?domain_id=MYSITE&format=json"

License

See LICENSE file for details.

Related Documentation

About

An Azure Web App in Python to crawl the content of SharePoint sites into OpenAI Vector Stores

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published