Skip to content

A fully autonomous multi-agent system for end-to-end data analysis. Upload any CSV → Agents perform Profiling → EDA → AutoML → Verification → Notebook Generation → Gemini Insights.

Notifications You must be signed in to change notification settings

VaishnaviSh14/MultiAgent-Data-Analyst

Repository files navigation

Multi-Agent AutoML Data Analyst (MCP + A2A + Gemini Powered)

Automated Profiler → EDA → AutoML → Verifier → Notebook Synthesizer → Gemini Insights

🚀 Live Demo (Render Deployment): https://multiagent-data-analyst.onrender.com/

Track: Enterprise Agents

Tech: Python, Streamlit, Gemini, MCP Tools, A2A Bus, Multi-Agent Architecture

Overview

The Multi-Agent Data Analyst is a fully automated, end-to-end data analysis pipeline powered by multiple specialized agents working together. It uploads a dataset, analyzes it, builds ML models, verifies the results, generates notebooks, and produces final insights — without any manual coding.

This project demonstrates:

✔ Multi-agent systems

✔ A2A (Agent-to-Agent)

✔ Tool-based agent execution (MCP Tools)

✔ Sessions & memory

✔ Context-aware notebook synthesis

✔ Gemini-powered explanations

✔ Streamlit multi-page application

Problem Statement

Performing data analysis typically requires switching between tools, writing repetitive code, running models manually, validating outputs, and documenting everything.

For beginners, this is overwhelming.

For analysts, it's time-consuming.

For teams, it’s inconsistent.

Goal: Build an agentic system that automates the entire workflow — from raw data to verified insights and notebook generation.

Why Agents?

Agents make the system:

-> Modular — each agent does one job

-> Autonomous — actions happen without the user triggering each step

-> Traceable — every step is observable

-> Composable — agents communicate using A2A bus

-> Extensible — new agents (e.g., Gemini Reviewer) can be added anytime

Instead of one giant notebook, the intelligence is distributed:

Agent Roles

-> Profiler Agent – inspects dataset, finds issues

-> EDA Agent – generates charts, summaries, anomalies

-> Model Agent (AutoML) – builds ML pipelines automatically

-> Verifier Agent – detects inconsistencies, bad models, missing columns

-> Notebook Synthesizer Agent – creates a clean notebook combining all outputs

-> Gemini Agent – explains the ML results in human-friendly language

Each agent writes outputs to memory → A2A orchestrates → Next agent reacts.

Architecture

image

1️⃣ ProfilerAgent

✔ Reads dataset

✔ Detects column types

✔ Finds missing values

✔ Sends message → EDAAgent

2️⃣ EDAAgent

✔ Creates correlations, histograms, outlier analysis

✔ Saves all plots via MCP FileTools

✔ Sends message → ModelAgent

3️⃣ ModelAgent

✔ Auto-detects task type (classification/regression)

✔ Builds full ML pipeline (imputation + scaling + encoding)

✔ Tunes models

✔ Saves best model

✔ Sends message → VerifierAgent

4️⃣ VerifierAgent

✔ Validates model quality

✔ Computes quality tag (“Good”, “Acceptable”, “Weak”)

✔ Sends message → NotebookAgent

5️⃣ NotebookSynthesizerAgent

✔ Builds a full auto-generated Jupyter Notebook

✔ Embeds all results and images

✔ Saves notebook through FileTools

6️⃣ Gemini Integration

✔ Gemini generates:

✔ Model explanations

✔ Recommendations

✔ Summaries

Plain-English explanations for beginners

7️⃣ Streamlit UI

✔ Beautiful dashboard with:

✔ Dataset Explorer

✔ EDA Dashboard

✔ AutoML Dashboard

✔ Verifier & Notebook Builder

✔ A2A Communications Console

Setup Instructions

Clone Repo

  1. git clone https://github.com/yourusername/multiagent-data-analyst

  2. cd multiagent-data-analyst

  3. pip install -r requirements.txt

  4. Add Gemini API Key

  5. Create .env:

  6. Run Streamlit -> streamlit run streamlit_app/app.py

Demo (Screenshots)

Dataset Upload

image

EDA Dashboard

image image image

AutoML Results

image image

Gemini Explanation

image

A2A Console

image

Notebook generated

image

Profiler Agent Output -

image image

Verifier Agent -

image

Tools & Technologies Used

🚀Category Tools

🚀Multi-Agent Custom Agents, A2A Bus

🚀LLM Gemini 1.5 Flash

🚀UI Streamlit

🚀ML Scikit-Learn

🚀Storage Custom MemoryTools

🚀Notebook nbformat

🚀Deployment Render

🚀Visualization Plotly, Matplotlib, Seaborn

🚀 LLM Gemini 1.5 Flash

🗂 Project Structure

multiagent-data-analyst/

├── src/

│ ├── agents/

│ ├── core/

│ ├── tools/

│ │ ├── file_tools.py

│ │ ├── dataset_tools.py

│ │ ├── memory_tools.py

│ │ ├── model_tools.py

│ │ └── notebook_tools.py

├── streamlit_app/

│ ├── app.py

│ └── pages/

│ ├── AutoML.py

│ ├── Profiler.py

│ ├── EDA_Dashboard.py

│ ├── Notebook_Report.py

│ ├── Verifier.py

│ └── A2A_Dashboard.py

├── streamlit_app_storage/

│ ├── memory/

│ ├── uploads/

│ └── reports/

└── README.md

Future Improvements

  1. Add RAG-based “Data Question Answering Agent”

  2. Add deployment on Google Cloud Run using Docker

  3. Add Evaluation Agent for model fairness

  4. Provide more AutoML models (XGBoost, LightGBM)

  5. Add voice-based interaction mode

Credits

Built by Vaishnavi Sharma as part of Google x Kaggle – Agents Intensive

If you find this useful, ⭐ star the repo!

About

A fully autonomous multi-agent system for end-to-end data analysis. Upload any CSV → Agents perform Profiling → EDA → AutoML → Verification → Notebook Generation → Gemini Insights.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published