Welcome! This project is a simple but functional blueprint for a RAG + fine-tuning pipeline with Apache Airflow. Fork this project to create your own content generation pipelines!
Tools used:
- Apache Airflow run with the Astro CLI to create a local instance in Docker
- OpenAI
- Weaviate - running as a local instance in Docker
- Streamlit - running as a local instance in Docker
- LangChain for chunking
- tiktoken for token counting
- Matplotlib for plotting
Run this Airflow project without installing anything locally.
- Fork this repository.
- Create a new GitHub codespaces project on your fork. Make sure it uses at least 4 cores!
- Inside of Codespaces, copy the
.env_example
file contents into a new.env
file and provide your OpenAI API key in both theOPENAI_API_KEY
andAIRFLOW_CONN_WEAVIATE_DEFAULT
fields. - Run
astro dev start
to start up all necessary Airflow components as well the Streamlit and Weaviate containers. This can take a few minutes. - Once the Airflow project has started access the Airflow UI by clicking on the Ports tab and opening the forward URL for port
8080
. The Streamlit app will be available on port8501
.
Download the Astro CLI to run Airflow locally in Docker. astro
is the only package you will need to install.
- Run
git clone https://github.com/astronomer/gen-ai-fine-tune-rag-use-case.git
on your computer to create a local clone of this repository. - Install the Astro CLI by following the steps in the Astro CLI documentation. The main prerequisite is Docker Desktop/Docker Engine but no Docker knowledge is needed to run Airflow with the Astro CLI.
- Copy the
.env_example
file contents into a new.env
file and provide your OpenAI API key in both theOPENAI_API_KEY
andAIRFLOW_CONN_WEAVIATE_DEFAULT
fields. - Run
astro dev start
in your cloned repository. - After your Astro project has started. View the Airflow UI at
localhost:8080
. The Streamlit app will be available onlocalhost:8501
.
- Unpause all DAGs, starting top to bottom, by clicking on the toggle on their left hand side. Once the
📚 Ingest Knowledge Base
DAG is unpaused it will run once, starting the RAG part of the pipeline. - Kick off the fine-tuning part of the pipeline by running the
🚀 0 - Start Fine-Tuning Pipeline
DAG manually. - Watch the DAGs run according to their dependencies which have been set using Datasets. The
🤖 Fine-tune
DAG will take approximately 15min to run. - After the last DAG in the pipeline
✨ Champion vs Challenger
has completed, open the Streamlit app atlocalhost:8501
/ port forward8501
in Codespaces. - In the streamlit app click
Generate post!
to generate new content using the fine-tuned model. - Click
Generate picture!
to get an image generated by DALLE about the new content.