Skip to content

NucleusEngineering/bq-optimization-agent

Repository files navigation

BigQuery Optimization Agent

This project contains a conversational agent that can help you optimize your BigQuery queries. You can provide a BigQuery job ID, and the agent will analyze the query and provide optimization recommendations.

Getting Started

Follow these steps to set up and run the project.

1. Prerequisites

  • Git
  • Google Cloud SDK (with gcloud and bq CLIs authenticated)
  • uv (a fast Python package installer and resolver)

2. Setup

a. Clone the repository:

git clone https://github.com/NucleusEngineering/bq-optimization-agent
cd bq-optimization-agent

b. Configure Environment Variables:

Create a .env file by copying the example file:

cp .env.example .env

Then, open the .env file and set your Google Cloud Project ID:

GOOGLE_CLOUD_PROJECT="your-gcp-project-id"

After setting your project ID, load the environment variables by running:

source .env

c. Install dependencies and run checks:

The make install command will create a virtual environment and install all the necessary dependencies. The make check command will run linting and static analysis to ensure the code is clean.

make install
make check

3. Choose your workflow

First, we need to check that Vertex AI API is enabled in our project:

gcloud services enable aiplatform.googleapis.com

Now that Vertex AI is enabled, you can choose one of the following workflows:

Option A: Use the Web Interface

This is the easiest way to interact with the agent.

make web

This command starts a web-based chat interface where you can provide a BigQuery job ID and get optimization recommendations.

Option B: Run the Agent Evaluation

This workflow is for evaluating the agent's performance.

a. Prepare the evaluation dataset:

make evaluation_prep

This command runs a series of unoptimized queries against BigQuery to generate job IDs. These job IDs are then used to create the evaluation_dataset.json file, which is needed for the evaluation process.

b. Run the evaluation:

make evaluate

This command runs the evaluation script (evaluate_agent.py). The script uses the evaluation_dataset.json file to test the agent's ability to optimize queries and provides a summary of the results.

4. Improve your agent's evaluation results

Congrats! You have successfully built and evaluated a BigQuery agent using the Agent Development Kit (ADK).

You now understand how to set up an ADK agent with BigQuery tools and measure its performance using custom evaluation metrics.

However, agent evaluation is an iterative process.

To improve the evaluation results further, you can tweak the system instructions, try out different models and its parameters.

Have fun with improving your agent!

Available Commands

The primary way to interact with this project is through the Makefile commands.

  • make help: Lists all available commands with a short description.
  • make install: Creates a virtual environment and installs all the necessary dependencies.
  • make check: Runs a series of checks to ensure code quality and consistency (linting, type checking, dependency checks).
  • make test: Runs the test suite using pytest.
  • make web: Starts a web-based chat interface to interact with the agent.
  • make evaluation_prep: Prepares the evaluation dataset by running unoptimized queries to generate job IDs.
  • make evaluate: Runs the agent evaluation script.

Project Structure

Here is a brief overview of the main directories in this project:

  • bq_agent_app/: Contains the core application code for the BigQuery Optimization Agent.
  • eval_resources/: Holds all resources needed for evaluating the agent. This includes:
    • evaluation_dataset.template.json: A template for the evaluation dataset.
    • populate_evaluation_dataset.sh: A script to populate the evaluation dataset with real job IDs.
    • eth_benchmarking_queries/: A directory containing a set of Ethereum-related BigQuery queries used for benchmarking the agent's optimization capabilities.
      • The root of this directory contains the unoptimized SQL queries.
      • evaluation/: Contains evaluation versions of the queries, scripts to run them, and explanations of the optimizations.
      • schemas/: Contains the JSON schema definitions for the public Ethereum BigQuery tables used in the queries.
  • eval_results/: This directory stores the output results from the evaluation runs.
  • logs/: This directory contains log files generated by the various scripts, such as populate_evaluation_dataset.sh and evaluate_agent.py.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published