Connect your SaaS tools to a vector database and keep your data synced
Sidekick is a framework for integrating with SaaS tools like Salesforce, Github, Notion, Zendesk and syncing data between these tools and a vector store. You can also use the integrations and chunkers from built by the community to get started quickly, or quickly build new integrations and write custom chunkers for different content types based on Sidekick's DataConnector
and DataChunker
specs.
Get an API key to test out a hosted version by joining our Slack community.. Post in the #api-keys channel to request a new key. You can test it out on some pre-ingested developer docs by tagging the Sidekick bot in the #sidekick-demo channel.
- Scrape HTML pages and chunk them
- Load Markdown files from a Github repo and chunk them
- Connect to Weaviate vector store and load chunks
- FastAPI endpoints to query vector store directly, or perform Q&A with OpenAI models
- Slackbot interface to perform Q&A with OpenAI models
DataConnector
andDataChunker
abstractions to make it easier to contribute new connectors/chunkers- Connect to Pinecone, Milvus, and Qdrant vector stores
To run Sidekick locally:
-
Install Python 3.10, if not already installed.
-
Clone the repository:
git clone https://github.com/ai-sidekick/sidekick.git
-
Navigate to the
sidekick-server
directory:cd /path/to/sidekick/sidekick-server
-
Install poetry:
pip install poetry
-
Create a new virtual environment with Python 3.10:
poetry env use python3.10
-
Install
poetry-dotenv
:poetry self add poetry-dotenv
-
Activate the virtual environment:
poetry shell
-
Install app dependencies:
poetry install
-
Set the required environment variables in a
.env
file insidekick-server
:DATASTORE=weaviate BEARER_TOKEN=<your_bearer_token> // Can be any string when running locally. e.g. 22c443d6-0653-43de-9490-450cd4a9836f OPENAI_API_KEY=<your_openai_api_key> WEAVIATE_HOST=<Your Weaviate instance host address> // Optional, defaults to http://127.0.0.1 WEAVIATE_PORT=<Your Weaviate port number> // Optional, defaults to 8080. Should be set to 443 for Weaviate Cloud WEAVIATE_INDEX=<Your chosen Weaviate class/collection name to store your chunks> // e.g. MarkdownChunk
Note that we currently only support weaviate as the data store. You can run Weaviate locally with Docker or set up a sandbox cluster to get a Weaviate host address.
-
Create a file
app_config.py
in thesidekick-server
directory. This should contain an objectapp_config
which maps from each bearer token to aproduct_id
app_config = { "22c443d6-0653-43de-9490-450cd4a9836f": { "product_id": "salesforce" } }
The
product_id
should be a unique identifier for the source of your data. -
Run the API locally:
poetry run start
-
Access the API documentation at
http://0.0.0.0:8000/docs
and test the API endpoints (make sure to add your bearer token).
For support and questions, join our Slack community.
The server is based on FastAPI so you can view the interactive API documentation at <local_host_url i.e. http://0.0.0.0:8000>/docs
when you are running it locally.
These are the available API endpoints:
-
/upsert-web-data
: This endpoint takes aurl
as input, uses Playwright to crawl through the webpage (and any linked webpages), and loads them into the vectorstore. -
/query
: Endpoint to query the vector database with a string. You can filter by source type (web, markdown, etc.) and set the max number of chunks returned. -
/ask-llm
: Endpoint to get an answer to a question from an LLM, based on the data in the vectorstore. In the response, you get back the sources used in the answer, the user's intent, and whether or not the question is answerable based on the content in your vectorstore.
Sidekick is open for contribution! To add a new data connector, follow the outlined steps:
- Create a new folder under
connectors
named<data-source>-connector
where<data-source>
is the name of the source you are connecting to. - This folder should contain a file
load.py
with a functionload_data
that returnsList[DocumentChunk]
- Create a new endpoint in
/server/main.py
that callsload_data
- Add the new source type in
models/models.py
- The boilerplate for this project is based on the ChatGPT Retrieval Plugin
- The licensing for this project is inspired by Airbyte's licensing model