link to the project: https://automatic-exploratory-data-analysis-rfzmcrpraqxdvfnuuyms6k.streamlit.app
π Automatic Exploratory Data Analysis (EDA) Tool π A simple and interactive Streamlit app for automatic exploratory data analysis (EDA), including PCA and feature importance analysis.
π Overview
This project is a Streamlit-based Exploratory Data Analysis (EDA) tool that allows users to upload CSV datasets and perform various analyses, including: β Data Summary & Statistics β Histograms of Numeric Features β Correlation Matrix Heatmap β Principal Component Analysis (PCA) β Feature Importance using Random Forest
The tool is modularized inside the eda_tool/ package, making it easy to extend and maintain.
π Project Structure
Automatic-Exploratory-Data-Analysis/ βββ eda_tool/ β βββ init.py β βββ data_loader.py # Handles CSV data loading β βββ eda_summary.py # Provides summary statistics β βββ missing_values.py # Handles missing value analysis β βββ outlier_detection.py # Detects outliers β βββ visualization.py # Visualization functions (histograms, PCA, feature importance) βββ app.py # Streamlit app βββ requirements.txt # Dependencies βββ README.md # Project documentation
π Features
πΉ Data Summary & Insights Displays key statistics such as mean, median, and standard deviation. Identifies missing values in the dataset. πΉ Data Visualization Histograms for numerical columns. Correlation Matrix heatmap for feature relationships. πΉ Advanced Analysis Principal Component Analysis (PCA): Visualizes explained variance. Feature Importance Analysis: Uses Random Forest to determine the most important features. π» Installation & Setup
1οΈβ£ Clone the repository git clone https://github.com/your-username/Automatic-Exploratory-Data-Analysis.git cd Automatic-Exploratory-Data-Analysis 2οΈβ£ Create a virtual environment (Optional but recommended) python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate 3οΈβ£ Install dependencies pip install -r requirements.txt 4οΈβ£ Run the Streamlit app streamlit run app.py π Usage
Open the Streamlit interface in your browser. Upload a CSV file. Explore the dataset using the available analysis options. Select a target variable for feature importance analysis. π§ Configuration
π οΈ Technologies Used
Python 3.x Pandas, NumPy (Data Processing) Matplotlib, Seaborn (Visualizations) Scikit-learn (Machine Learning for PCA & Feature Importance) Streamlit (Interactive Web App) π Future Improvements
β Support for time-series analysis. β More outlier detection techniques. β Dynamic feature selection for ML model training.