Skip to content

uzumstanley/Automatic-Exploratory-Data-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

45 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

link to the project: https://automatic-exploratory-data-analysis-rfzmcrpraqxdvfnuuyms6k.streamlit.app

πŸ“Œ Automatic Exploratory Data Analysis (EDA) Tool πŸš€ A simple and interactive Streamlit app for automatic exploratory data analysis (EDA), including PCA and feature importance analysis.

πŸ“– Overview

This project is a Streamlit-based Exploratory Data Analysis (EDA) tool that allows users to upload CSV datasets and perform various analyses, including: βœ… Data Summary & Statistics βœ… Histograms of Numeric Features βœ… Correlation Matrix Heatmap βœ… Principal Component Analysis (PCA) βœ… Feature Importance using Random Forest

The tool is modularized inside the eda_tool/ package, making it easy to extend and maintain.

πŸ“‚ Project Structure

Automatic-Exploratory-Data-Analysis/ │── eda_tool/ β”‚ β”œβ”€β”€ init.py β”‚ β”œβ”€β”€ data_loader.py # Handles CSV data loading β”‚ β”œβ”€β”€ eda_summary.py # Provides summary statistics β”‚ β”œβ”€β”€ missing_values.py # Handles missing value analysis β”‚ β”œβ”€β”€ outlier_detection.py # Detects outliers β”‚ β”œβ”€β”€ visualization.py # Visualization functions (histograms, PCA, feature importance) │── app.py # Streamlit app │── requirements.txt # Dependencies │── README.md # Project documentation

πŸš€ Features

πŸ”Ή Data Summary & Insights Displays key statistics such as mean, median, and standard deviation. Identifies missing values in the dataset. πŸ”Ή Data Visualization Histograms for numerical columns. Correlation Matrix heatmap for feature relationships. πŸ”Ή Advanced Analysis Principal Component Analysis (PCA): Visualizes explained variance. Feature Importance Analysis: Uses Random Forest to determine the most important features. πŸ’» Installation & Setup

1️⃣ Clone the repository git clone https://github.com/your-username/Automatic-Exploratory-Data-Analysis.git cd Automatic-Exploratory-Data-Analysis 2️⃣ Create a virtual environment (Optional but recommended) python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate 3️⃣ Install dependencies pip install -r requirements.txt 4️⃣ Run the Streamlit app streamlit run app.py πŸ“Œ Usage

Open the Streamlit interface in your browser. Upload a CSV file. Explore the dataset using the available analysis options. Select a target variable for feature importance analysis. πŸ”§ Configuration

πŸ› οΈ Technologies Used

Python 3.x Pandas, NumPy (Data Processing) Matplotlib, Seaborn (Visualizations) Scikit-learn (Machine Learning for PCA & Feature Importance) Streamlit (Interactive Web App) πŸ“Œ Future Improvements

βœ… Support for time-series analysis. βœ… More outlier detection techniques. βœ… Dynamic feature selection for ML model training.

About

Automating Data processing workflow to serve users in a simple smart interface

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages