This repository contains a Python script for performing data analysis and visualization on a dataset of cars. The analysis includes dataset exploration, statistical summaries, and visualizations to understand the distribution and relationships within the data.
data_analysis_cars.py: Python script for data analysis and visualization.cars.csv: Dataset file containing information about cars.requirements.txt: List of Python dependencies required to run the script.mpg_distribution.png,hp_vs_wt.png,mpg_by_cylinders.png: Output visualization files.
To install the required Python libraries, use the requirements.txt file:
pip install -r requirements.txtThe data_analysis_cars.py script:
- Loads the dataset using Pandas.
- Cleans data by removing non-ASCII characters from the
Modelcolumn. - Displays dataset information, shape, and a preview.
- Calculates summary statistics: mean, median, standard deviation.
- Generates visualizations:
- Histogram of MPG distribution.
- Scatter plot of Horsepower vs. Weight by Cylinders.
- Boxplot of MPG by Cylinders.
- Clone the repository:
git clone https://github.com/shivamr021/data-decisions-python.git
- Navigate to the directory:
cd data-decisions-python - Install dependencies:
pip install -r requirements.txt
- Run the script:
python data_analysis_cars.py
This project is part of the IBM SkillsBuild Winter Certification Program with CSRBOX. Thanks to IBM and CSRBOX teams for this valuable learning opportunity.