This repository contains tutorials, code examples, and resources for performing exploratory data analysis using modern AI-powered tools. These tools can significantly speed up your data analysis workflow and provide insights that might be difficult to discover manually.
Exploratory Data Analysis (EDA) is a critical step in any data science project, allowing analysts to understand data characteristics, identify patterns, and detect anomalies. AI-enhanced EDA tools leverage machine learning and natural language processing to automate and augment this process, making it more efficient and insightful.
This repository provides resources for both beginners and advanced users to harness the power of these AI tools.
- Python 3.7+
- pip (Python package manager)
-
Clone this repository:
git clone https://github.com/yourusername/ai-enhanced-eda.git cd ai-enhanced-eda -
Create a virtual environment (recommended):
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install basic dependencies:
pip install -r requirements.txt
-
Install specific tools as needed:
# For PyGWalker pip install pygwalker # For Pandas-AI pip install pandasai # For AutoViz pip install autoviz # For DataPrep pip install dataprep # For SweetViz pip install sweetviz # For D-Tale pip install dtale
PyGWalker transforms your pandas DataFrame into an interactive Tableau-like interface directly in Jupyter notebooks.
Key Features:
- Drag-and-drop interface for visualization
- Multiple chart types
- Data filtering and transformation
- Export capabilities
Pandas-AI allows you to explore data using natural language queries.
Key Features:
- Query data using natural language
- Generate visualizations with text commands
- Perform complex analysis without coding
- Requires OpenAI API key
AutoViz automatically visualizes datasets with minimal code.
Key Features:
- Automatic visualization selection
- Feature relationship exploration
- Target variable-based analysis
- Minimal configuration required
DataPrep simplifies data preparation and exploratory analysis.
Key Features:
- Comprehensive EDA reports
- Data cleaning capabilities
- Performance optimization for large datasets
- API for integration into workflows
SweetViz provides beautiful visualizations for EDA and data comparison.
Key Features:
- High-density visualizations
- Dataset comparison
- Target analysis
- Feature associations
D-Tale offers an interactive interface for pandas DataFrame analysis.
Key Features:
- Web-based interface
- Correlation analysis
- Charting and visualization
- Statistical testing
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Load your dataset
df = pd.read_csv('your_dataset.csv')
# Basic pandas EDA
print(df.info())
print(df.describe())
# Missing values check
print(df.isnull().sum())
# Try PyGWalker for interactive visualization
import pygwalker as pyg
walker = pyg.walk(df) # Opens interactive interface in Jupyter notebookfrom pandasai import PandasAI
from pandasai.llm.openai import OpenAI
# Set up the LLM
llm = OpenAI(api_token="your-openai-api-key")
# Initialize PandasAI with the LLM
pandas_ai = PandasAI(llm)
# Ask questions about your data
response = pandas_ai(df, "What's the correlation between column A and column B?")
print(response)
# Generate a visualization
response = pandas_ai(df, "Create a scatter plot of A vs B colored by C")# Using DataPrep
from dataprep.eda import create_report
report = create_report(df, title='My Dataset Analysis')
report.show_browser()
# Using SweetViz
import sweetviz as sv
report = sv.analyze(df, target_feat='target_column')
report.show_html('sweetviz_report.html')The file ai_eda_examples.py in this repository demonstrates a comprehensive workflow that combines multiple AI-enhanced EDA tools:
python ai_eda_examples.pyThis script allows you to:
- Perform basic EDA with pandas
- Generate automated reports with DataPrep and SweetViz
- Explore data using natural language with Pandas-AI
- Create interactive visualizations with PyGWalker
- Run automated visualization with AutoViz
- Launch interactive analysis with D-Tale
For the most effective use of these tools, consider this workflow:
- Initial Profiling: Use DataPrep or SweetViz for a quick overview
- Question Formulation: Based on initial findings, form questions
- Natural Language Exploration: Use Pandas-AI to answer specific questions
- Interactive Deep Dive: Use PyGWalker or D-Tale for detailed investigation
- Documentation: Export findings as reports or visualizations
- Most tools work well for datasets up to a few hundred thousand rows
- For larger datasets:
- Sample your data first for initial exploration
- Use tools with efficient data handling (DataPrep, D-Tale)
- Consider chunking your analysis by columns or subsets
- For very large datasets, consider distributed processing frameworks
This project is licensed under the MIT License - see the LICENSE file for details.
Contributions are welcome! Here's how you can help:
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Commit your changes:
git commit -m 'Add some amazing feature' - Push to the branch:
git push origin feature/amazing-feature - Open a Pull Request
Follow me on GitHub:
Star this repository:
Connect on LinkedIn:
Click the buttons above to show your support!
- The creators and maintainers of all the tools mentioned
- The data science community for continued innovation in EDA techniques