A FastAPI-based middleware application that bridges SharePoint content with OpenAI's AI capabilities. It crawls SharePoint sites (document libraries, lists, and site pages), processes the content, and manages OpenAI vector stores for intelligent search and retrieval.
- SharePoint Crawler: Automated crawling of SharePoint document libraries, lists, and site pages
- OpenAI Proxy: Full proxy for OpenAI APIs (Files, Vector Stores, Responses)
- Semantic Search: AI-powered search across SharePoint content using OpenAI's file search
- Knowledge Domains: Combine documents, lists, site pages from any SharePoint site or location into unified "domains" based on SharePoint filters
- Inventory Management: Track and manage vector stores, files, and assistants
- Flexible Authentication: Support for Azure OpenAI (key, managed identity, service principal) and OpenAI API
- Persistent Storage: Organized file structure for crawled content and metadata (see PERSISTENT_STORAGE_STRUCTURE.md)
- PERSISTENT_STORAGE_STRUCTURE.md - Detailed folder structure and file formats
- DATA_SOURCES.md - Data flow between SharePoint, local files, and vector stores
- SECURE_AZURE_APP_SERVICE.md - Azure App Service security configuration
src/
├── app.py # FastAPI application entry point
├── hardcoded_config.py # Configuration constants
├── routers_v1/common_openai_functions_v1.py # OpenAI client utilities (in routers_v1/)
├── common_crawler_functions.py # SharePoint crawler utilities
├── routers_v1/common_sharepoint_functions_v1.py # SharePoint access utilities (in routers_v1/)
├── common_utility_functions.py # Helper functions
├── routers_static/ # Static routers (no version prefix)
│ ├── openai_proxy.py # OpenAI API proxy endpoints
│ └── sharepoint_search.py # AI-powered search endpoints
└── routers_v1/ # V1 routers (mounted at /v1)
├── crawler.py # SharePoint crawler endpoints
├── domains.py # Domain management endpoints
├── inventory.py # Vector store inventory endpoints
├── common_ui_functions_v1.py # Shared UI generation functions
└── common_job_functions_v1.py # Shared job management functions
The application uses a persistent storage system to organize domains, crawled content, and logs. For detailed information about the storage structure, see PERSISTENT_STORAGE_STRUCTURE.md.
PERSISTENT_STORAGE_PATH/
├── domains/ # Domain configurations and metadata
├── crawler/ # Crawled SharePoint content
└── logs/ # Application logs
Full proxy for OpenAI APIs with support for both Azure OpenAI and OpenAI services:
| API | Method | Path | Description |
|---|---|---|---|
| Responses | POST | /openai/responses |
Create response |
| Responses | GET | /openai/responses |
List responses |
| Responses | GET | /openai/responses/{response_id} |
Get response |
| Responses | DELETE | /openai/responses/{response_id} |
Delete response |
| Files | POST | /openai/files |
Upload file |
| Files | GET | /openai/files |
List files |
| Files | GET | /openai/files/{file_id} |
Get file |
| Files | DELETE | /openai/files/{file_id} |
Delete file |
| Files | GET | /openai/files/{file_id}/content |
Get file content |
| Vector Stores | POST | /openai/vector_stores |
Create vector store |
| Vector Stores | GET | /openai/vector_stores |
List vector stores |
| Vector Stores | GET | /openai/vector_stores/{vector_store_id} |
Get vector store |
| Vector Stores | POST | /openai/vector_stores/{vector_store_id} |
Update vector store |
| Vector Stores | DELETE | /openai/vector_stores/{vector_store_id} |
Delete vector store |
| Vector Store Files | POST | /openai/vector_stores/{vector_store_id}/files |
Create vector store file |
| Vector Store Files | GET | /openai/vector_stores/{vector_store_id}/files |
List vector store files |
| Vector Store Files | GET | /openai/vector_stores/{vector_store_id}/files/{file_id} |
Get vector store file |
| Vector Store Files | DELETE | /openai/vector_stores/{vector_store_id}/files/{file_id} |
Delete vector store file |
AI-powered search across SharePoint content:
| Endpoint | Method | Description |
|---|---|---|
/query |
POST | Execute search query (JSON) |
/query2 |
GET/POST | Execute search query (HTML/JSON) |
/describe |
GET | Get search configuration |
/describe2 |
GET | Get search configuration (HTML/JSON) |
Manage SharePoint domains and their vector stores:
| Endpoint | Method | Description |
|---|---|---|
/v1/domains |
GET | List all domains (HTML/JSON/UI) |
/v1/domains/create |
GET/POST | Create new domain |
/v1/domains/update |
GET/PUT | Update domain configuration |
/v1/domains/delete |
DELETE | Delete domain |
SharePoint content crawling and synchronization:
| Endpoint | Method | Description |
|---|---|---|
/v1/crawler |
GET | Crawler UI and documentation |
/v1/crawler/localstorage |
GET | Local storage inventory (HTML/JSON/ZIP) |
/v1/crawler/list_sharepoint_files |
GET | List files from SharePoint source |
/v1/crawler/list_local_files |
GET | List local embedded files |
/v1/crawler/list_vectorstore_files |
GET | List files in domain vector store |
/v1/crawler/download_files |
GET | Download files from SharePoint |
/v1/crawler/update_vector_store |
GET | Update vector store with local files |
/v1/crawler/replicate_to_global |
GET | Replicate domain stores to global |
/v1/crawler/migrate_from_v2_to_v3 |
GET | Migrate metadata format |
Manage and inspect OpenAI resources:
| Endpoint | Method | Description |
|---|---|---|
/v1/inventory |
GET | Inventory documentation |
/v1/inventory/vectorstores |
GET | List vector stores (HTML/JSON/UI) |
/v1/inventory/vectorstores/delete |
DELETE | Delete vector store |
/v1/inventory/vectorstore_files |
GET | List files in vector store |
/v1/inventory/vectorstore_files/remove |
DELETE | Remove file from vector store |
/v1/inventory/vectorstore_files/delete |
DELETE | Delete file from store and storage |
/v1/inventory/files |
GET | List all files (HTML/JSON/UI) |
/v1/inventory/files/delete |
DELETE | Delete file from storage |
/v1/inventory/assistants |
GET | List assistants (HTML/JSON/UI) |
/v1/inventory/assistants/delete |
DELETE | Delete assistant |
| Endpoint | Method | Description |
|---|---|---|
/ |
GET | Application home page with links |
/alive |
GET | Health check endpoint |
/openaiproxyselftest |
GET | Run OpenAI proxy self-test |
The application is configured via environment variables. Copy env-file-template.txt to .env and configure:
OPENAI_SERVICE_TYPE:openaiorazure_openai- Azure OpenAI:
AZURE_OPENAI_ENDPOINT: Azure OpenAI endpoint URLAZURE_OPENAI_API_KEY: API key (if using key authentication)AZURE_OPENAI_API_VERSION: API version (default:2025-04-01-preview)AZURE_OPENAI_DEFAULT_MODEL_DEPLOYMENT_NAME: Model deployment name- Authentication options:
- Key:
AZURE_OPENAI_USE_KEY_AUTHENTICATION=true - Service Principal:
AZURE_TENANT_ID,AZURE_CLIENT_ID,AZURE_CLIENT_SECRET - Managed Identity:
AZURE_OPENAI_USE_MANAGED_IDENTITY=true,AZURE_MANAGED_IDENTITY_CLIENT_ID
- Key:
- OpenAI API:
OPENAI_API_KEY: OpenAI API keyOPENAI_ORGANIZATION: Organization IDOPENAI_DEFAULT_MODEL_NAME: Model name (default:gpt-4o-mini)
CRAWLER_CLIENT_ID: Azure AD app registration client IDCRAWLER_CLIENT_CERTIFICATE_PFX_FILE: Certificate file for authenticationCRAWLER_CLIENT_CERTIFICATE_PASSWORD: Certificate passwordCRAWLER_TENANT_ID: Azure AD tenant ID
SEARCH_DEFAULT_GLOBAL_VECTOR_STORE_ID: Default vector store for searchSEARCH_DEFAULT_MAX_NUM_RESULTS: Maximum search results (default: 20)SEARCH_DEFAULT_TEMPERATURE: AI temperature (default: 0.0)SEARCH_DEFAULT_INSTRUCTIONS: Default search instructionsSEARCH_DEFAULT_SHAREPOINT_ROOT_URL: SharePoint root URL
LOCAL_PERSISTENT_STORAGE_PATH: Local storage path (for local development)LOG_QUERIES_AND_RESPONSES: Enable detailed logging (default: false)
For complete configuration details, see env-file-template.txt.
Prerequisites:
- Python 3.12
- PowerShell 7+ (Windows)
- Azure AD app registration with SharePoint permissions (for crawler)
- OpenAI API key or Azure OpenAI service
Steps:
-
Create and populate
.env- Copy
env-file-template.txtto.envat the repo root and fill in your values.
- Copy
-
Install dependencies (choose one)
Option A: Use helper batch file (recommended on Windows)
.\InstallAndCompileDependencies.bat
This creates
.venv, installs uv, installs dependencies fromsrc/pyproject.toml(editable), and generatesrequirements.txt.Option B: Manual installation
python -m venv .venv .\.venv\Scripts\Activate.ps1 pip install -U pip pip install -r requirements.txt
-
Run the API (FastAPI/Uvicorn)
From repo root:
python -m uvicorn app:app --app-dir src --host 0.0.0.0 --port 8000 --reload
Access the application:
- Home: http://localhost:8000/
- Health check: http://localhost:8000/alive
- API docs: http://localhost:8000/docs
-
Run from VS Code (alternative)
With debugger:
- Press
F5 - OR: Run and Debug panel → select
Python: FastAPI (Uvicorn)→ Start - Browser opens automatically to
/
Without debugger:
- Terminal → Run Task →
Run API (Uvicorn) - Optionally run the
Open Browsertask
- Press
Prerequisites:
- Azure subscription and resource group
- Azure CLI and Az PowerShell module (scripts install if missing)
.envfile configured with Azure settings
Steps:
-
Configure environment
- Ensure
.envcontains Azure deployment settings:AZURE_SUBSCRIPTION_IDAZURE_TENANT_IDAZURE_RESOURCE_GROUPAZURE_APP_SERVICE_NAMEAZURE_LOCATION(default:swedencentral)
- Ensure
-
Provision and deploy
Option A: Use helper batch files (recommended)
Create Azure resources (if not already created):
.\CreateAzureAppService.bat
Deploy current code to the Web App:
.\DeployAzureAppService.bat
(Optional) Delete the Web App and plan:
.\DeleteAzureAppService.bat
Option B: Run PowerShell script directly
.\DeployAzureAppService.ps1
What the deployment script does:
- Reads
.envand sets app settings (excluding deployment-only keys) - Sets startup command:
python -m uvicorn app:app --host 0.0.0.0 --port 8000 - Sets
WEBSITES_PORT=8000for proper traffic routing - Packages
src/plus rootrequirements.txtintodeploy.zip - Zip-deploys to Azure Web App (Oryx builds on server)
Verify deployment:
- Health check:
https://<APP_NAME>.azurewebsites.net/alive - Application:
https://<APP_NAME>.azurewebsites.net/ - API docs:
https://<APP_NAME>.azurewebsites.net/docs
View logs:
- Portal:
https://<APP_NAME>.scm.azurewebsites.net/api/logs/docker - CLI:
az webapp log tail --name <APP_NAME> --resource-group <RESOURCE_GROUP>
Troubleshooting:
- Check Docker logs for Python import errors or missing packages
- Verify
.envvalues are correct and complete - Ensure Azure AD app has proper SharePoint permissions (for crawler)
- If changing port, update both startup command and
WEBSITES_PORT - Check persistent storage path is writable (Azure:
/home/data)
To enable SharePoint crawling, you need to:
-
Create Azure AD App Registration
- Register an app in Azure AD
- Add SharePoint API permissions:
Sites.Read.All - Create a certificate for authentication
-
Configure Certificate
- Use
CreateSelfSignedCertificate.ps1to generate a certificate - Upload certificate to Azure AD app registration
- Place
.pfxfile in repo root - Set
CRAWLER_CLIENT_CERTIFICATE_PFX_FILEandCRAWLER_CLIENT_CERTIFICATE_PASSWORDin.env
- Use
-
Grant SharePoint Permissions
- Use
AddRemoveCrawlerSharePointSites.ps1to grant site collection permissions - Use
AddRemoveCrawlerPermissions.ps1to manage app permissions
- Use
-
Create Domain Configuration
- Use
/domains/createendpoint to create domain configurations - Configure SharePoint sources (document libraries, lists, site pages)
- Each domain gets its own vector store
- Use
For detailed storage structure, see PERSISTENT_STORAGE_STRUCTURE.md.
curl -X POST "http://localhost:8000/query" \
-H "Content-Type: application/json" \
-d '{
"query": "Find documents about project planning",
"max_num_results": 10
}'curl -X POST "http://localhost:8000/v1/domains/create" \
-H "Content-Type: application/x-www-form-urlencoded" \
-d "domain_id=MYSITE&name=My SharePoint Site&description=Project documentation site&vector_store_name=SharePoint-MYSITE"# Download files from SharePoint
curl "http://localhost:8000/v1/crawler/download_files?domain_id=MYSITE&format=json"
# Update vector store with downloaded files
curl "http://localhost:8000/v1/crawler/update_vector_store?domain_id=MYSITE&format=json"See LICENSE file for details.
- PERSISTENT_STORAGE_STRUCTURE.md - Detailed storage structure documentation
- DATA_SOURCES.md - Data flow between SharePoint, local files, and vector stores
- SECURE_AZURE_APP_SERVICE.md - Azure App Service security configuration