The Great Space Race to Save the Earth

A Near Earth Object Classification project by Brandon Navarrete

🌎 Goal

Humanity has forgotten how fragile we live. It would be wise to keep an eye to the stary abyss and determine if a near earth object has the potential to be hazardous. With world climate becoming ever changing, we have lost sight on protecting the one resource we all share. EARTH.

Here I will develop a model that can classify hazardous asteroids given their respective features of diameter, magnitude, and velocity
This will be encompassed in a report that is easy to read and interpret to any viewers.

🗺️ Data Overview

This data was pulled from kaggle(2023) which has been pulled from NASA's API
90836 rows, each it's own object or asteroids with 10 columns of its features

Initial Questions

How Many of our Objects Are Inert?
Will Diameter play a Big Difference in Determining Hazard Status
Will Relative Velocity play a Big Difference in Determining Hazard Status
Will Absolute Magnitude play a Big Difference in Determining Hazard Status

Data Dictionary

📂 Data Dictionary

Variable	Value	Meaning
ID	numerical	Unique Identifier for each Asteroid
Name	string	Name given by NASA
est_diameter_min	Float	Minimum Estimated Diameter in Kilometeres.
est_diameter_max	Float	Maximum Estimated Diameter in Kilometeres.
relative_velocity	Float	Velocity Relative to Earth
orbiting_body	string	Earth
sentry_object	False	Included in sentry - an automated collision monitoring system
absolute_magnitude	Float	Describes intrinsic luminosity
Hazardous	Boolean	Feature that shows whether asteroid is harmful or not

Project Plan / Process

1️⃣ Data Acquisition

Gather data from kaggle database

Import csv in local files
Read/ Creat data dictionary and extract meaningful columns

acquire.py

Create acquire.py and user-defined function to import data from csv

2️⃣ Data Preparation

Data Cleaning

Missing values:
- No missing values in kaggle dataset
Outliers
- Outliers were kept
Droppeds
- id,name,orbiting_body, sentry` columns were dropped,no useful information.

Data Splitting

Create function to split data into train, validate, test
Call the function, and store the 3 data samples separately in the form of dataframe

3️⃣ Exploratory Analysis

Ask questions to find what are the key features that are associated with hazard status
Explore each feature's correlation with status
Using visualizations to better understand the relationship between features

4️⃣ Statistical Testing & Modeling

Conduct mann whitney test
Conclude hypothesis and address the initial questions

5️⃣ Modeling Evaluation

Find the amount of features that can gerenate the highest performance (Recall)
Generate XGboost, fit and tranform the train dataset into feature
Pick the model with highest accuracy and evaluate on test dataset

🏅 Key Findings

About 10 % of data was classified as hazardous
All 3 features above shows promise in determing hazard status
The best performing model was the XGboost and was able to detect 98% of hazardous asteroids

Recommendation

This model has a high percentage of finding the hazardous asteroids at the cost of a low accuracy, due to the false postitives

This model should be used UNTIL a better model is developed

# Next Steps

Use the API to gather more relevant features, try to increase hazardous object capture rate.
Combine with image recogonition, try to automate process to have 24/7 observation / protection

Steps To Clone:

Clone this repo
Import NASA's csv
Run Notebook
some dependencies may need to be installed such as 'xgboost'

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
README.md		README.md
explore.ipynb		explore.ipynb
final_notebook.ipynb		final_notebook.ipynb
local_image.png		local_image.png
model.py		model.py
neo_v2.csv		neo_v2.csv
working_notebook.ipynb		working_notebook.ipynb
wrangle.py		wrangle.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Great Space Race to Save the Earth

🌎 Goal

🗺️ Data Overview

Initial Questions

Data Dictionary

📂 Data Dictionary

Project Plan / Process

1️⃣ Data Acquisition

2️⃣ Data Preparation

3️⃣ Exploratory Analysis

4️⃣ Statistical Testing & Modeling

5️⃣ Modeling Evaluation

🏅 Key Findings

Recommendation

Steps To Clone:

some dependencies may need to be installed such as 'xgboost'

About

Releases

Packages

Languages

brandontnavarrete/nasa-neow-python

Folders and files

Latest commit

History

Repository files navigation

The Great Space Race to Save the Earth

🌎 Goal

🗺️ Data Overview

Initial Questions

Data Dictionary

📂 Data Dictionary

Project Plan / Process

1️⃣ Data Acquisition

2️⃣ Data Preparation

3️⃣ Exploratory Analysis

4️⃣ Statistical Testing & Modeling

5️⃣ Modeling Evaluation

🏅 Key Findings

Recommendation

Steps To Clone:

some dependencies may need to be installed such as 'xgboost'

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages