Name	Name	Last commit message	Last commit date
Latest commit History 41 Commits
.github/workflows	.github/workflows
data	data
diagrams/heatmaps	diagrams/heatmaps
logistics/docs	logistics/docs
source	source
.DS_Store	.DS_Store
.gitattributes	.gitattributes
LICENSE	LICENSE
README.md	README.md

Asteroid Data Exploration Project

Milestone 1:

Link To Colab:

Abstract:

In the study of objects within our solar system, there have been many attempts to classify groups of objects to help estimate their properties. However, the classical approach can miss the subtle correlations that machine learning techniques thrive on. This study aims to enhance the prediction of asteroid features using machine learning algorithms. We aim to utilize a dataset provided by Jet Propulsion Laboratory of California Institute of Technology, and apply various regression techniques to achieve higher accuracy and low error rates in feature prediction. The dataset comprises 31 features for 839,714 objects, including their names, semi-major axis, eccentricity, inclination, orbital period, diameter, and other orbital elements. Our project focuses on utilizing feature engineering, linear and polynomial regression models. Additionally, we aim to use clustering algorithms to attempt to classify asteroids. Our findings contribute to the growing intersection between machine learning and astronomy, providing robust tools for potential applications in space warning systems.

Milestone 2:

Overview

This project analyzes a dataset with 839,714 observations and 31 features. The analysis includes data cleaning, encoding, and visualization to understand correlations and distributions.

Data Description

The features in our data have been described below.:

Click to view table containing details of data.

Feature Name	Description
full_name	Full Name of Body: Contains full unique name of the body.
a	Semi-Major Axis (Unit - au): The average distance between the object and the Sun, measured in astronomical units (au).
e	Eccentricity: Describes the shape of the object's orbit, with values ranging from 0 (circular) to close to 1 (highly elliptical).
G	Magnitude Slope Parameter: Factor in determining the brightness variation of the object, reflecting how its brightness changes with phase angle.
i	Inclination (Unit - deg): Angle of the object's orbital plane relative to the plane of the solar system, measured in degrees.
om	Longitude of the Ascending Node: Angle from the reference direction (usually the vernal equinox) to the point where the object's orbit crosses the plane of the solar system from South to North.
w	Argument of Perihelion: Angle between the ascending node and the point of closest approach to the Sun (perihelion).
q	Perihelion Distance (Unit - au): Shortest distance between the object and the Sun during its orbit, measured in astronomical units (au).
ad	Aphelion Distance (Unit - au): Farthest distance between the object and the Sun during its orbit, measured in astronomical units (au).
per_y	Orbital Period: Time taken for the object to complete one full orbit around the Sun, measured in years.
data_arc	Data Arc-Span (Unit - Days): Duration over which observations of the object have been collected, measured in days.
condition_code	Orbit Condition Code: Numerical code indicating the quality and reliability of the object's orbital data, with 0 being the most reliable.
n_obs_used	Number of Observations Used: Total number of observations used to determine the object's orbital parameters.
H	Absolute Magnitude Parameter: Measure of the object's intrinsic brightness, indicating its size and reflectivity.
diameter	Diameter of Asteroid (Unit - Km): Physical size of the asteroid, measured in kilometers (km).
extent	Object Bi/Tri-Axial Ellipsoid Dimensions (Unit - Km): Dimensions describing the shape and size of the object in terms of its three principal axes, measured in kilometers (km).
albedo	Geometric Albedo: Reflectivity of the object's surface, indicating the proportion of sunlight it reflects.
rot_per	Rotation Period (Unit - Hours): Time taken for the object to complete one full rotation on its axis, measured in hours.
GM	Standard Gravitational Parameter: Product of the gravitational constant and the object's mass, used in gravitational calculations.
BV	Color Index B-V Magnitude Difference: Difference in brightness between the object in the B (blue) and V (visual) photometric bands, indicating its color.
UB	Color Index U-B Magnitude Difference: Difference in brightness between the object in the U (ultraviolet) and B (blue) photometric bands, providing spectral information.
IR	Color Index I-R Magnitude Difference: Difference in brightness between the object in the I (infrared) and R (red) photometric bands, conveying thermal properties.
spec_B	Spectral Taxonomic Type (Unit - SMASSII): Spectral classification of the object based on the SMASSII scheme, indicating its mineral composition and surface features.
spec_T	Spectral Taxonomic Type (Unit - Tholen): Spectral classification of the object based on the Tholen system, indicating its spectral characteristics, composition, and origin.
neo	Near Earth Object: Indicates whether the object is classified as a Near Earth Object (NEO), meaning its orbit brings it close to Earth's orbit.
pha	Potentially Hazardous Asteroid: Identifies whether the object is classified as a Potentially Hazardous Asteroid (PHA), posing a potential threat to Earth.
moid	Earth Minimum Orbit Intersection Distance (Unit - au): Smallest distance between the object's orbit and Earth's orbit, measured in astronomical units (au), indicating potential close encounters.
class	Class of Asteroid: Visit nasa.com to learn more about classes
n	Unsure of what this is
per	Period
ma	ma

Data Heatmap

We utilized a heatmap to visualize the correlations between different features in the dataset. This graphical representation helps in identifying the strength and direction of relationships among the variables, providing a clear and intuitive way to detect patterns, trends, and anomalies in the data. The heatmap is particularly useful for understanding multicollinearity, guiding feature selection, and improving model performance.

Click to view Heatmap.

Data Distribution using `.describe()`

We employed the .describe() method to obtain a statistical summary of the features we are interested in. This summary includes metrics such as count, mean, standard deviation, minimum, and maximum values, as well as the 25th, 50th, and 75th percentiles. These statistics provide valuable insights into the central tendency, dispersion, and overall shape of the data distribution, facilitating the identification of outliers and informing subsequent data preprocessing and analysis steps.

Click to view `.describe()` initial columns.

Data Preprocessing

To prepare the dataset for analysis, we undertook several preprocessing steps:

Remove String Columns:
- We dropped the columns name, spec_B, spec_T, and class as they contain string values that are not suitable for numerical analysis.
Handle Missing Values:
- We dropped the columns rot_per, GM, BV, and UB due to a high number of NaN values.
- We removed any rows that contained NaN values for the diameter feature in order to ensure a clean dataset for analysis.
  Original dataset size: 839714
  Dataset size after dropping rows: 24404
  Number of rows dropped: 815310
Check Correlations:
- By plotting pairplots and the heatmap, we discovered a reasonably strong correlation between diameter and the following features: q, moid, H, data_arc, and n.
- These correlations will be explored further in subsequent steps to understand their impact and relationships.
Data Encoding:
- We change the values in the pha containing string values of 'Y' or 'N' to 1s and 0s to make graphing and working on them easier.
Analyzing effects of preproccesing on data distribution:
- Using the scipy.stats.ks_2samp https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ks_2samp.html we found that dropping the NAN rows did not severely effect the distribution. The KS test checks for the likelyhood that two samples were drawn from the same distribution, and for the variables we are interested in found p-values of 2.488278122363494e-60 for q, 0.0 for H and 1.4086431738613219e-53 for moid. All indicate that the effect was negligible.

Graph Data Analysis

To better understand the relationships between various features and the diameter, we graphed several feature correlations. This graphical analysis aids in identifying potential relationships and patterns that might not be immediately evident through raw data or simple statistical summaries.

Diameter vs. q:
- We plotted the relationship between diameter and q (perihelion distance). This scatter plot helps us observe any direct or inverse relationships between the size of the object and its perihelion distance.
Diameter vs. moid:
- The scatter plot between diameter and moid (minimum orbit intersection distance) was analyzed to see if there is any correlation between the object's size and its closest approach to Earth.
Diameter vs. H:
- We also examined the correlation between diameter and H (absolute magnitude). This plot is particularly interesting as it helps in understanding how the brightness of an object might relate to its size.
Diameter vs. n:
- Analyzing the scatter plot of diameter versus n (number of observations) can reveal whether more observations correlate with more accurate or different size estimations.
Correlation Difference after dropping NAN values in preproccesing
Distribution Difference after dropping NAN values in preproccesing
- Histogram of q:
- Histogram of H:
- Histogram of moid:

These visualizations provide several insights:

Identifying Outliers:
- Scatter plots help in easily identifying any outliers that may exist in the data, which could potentially skew the analysis or indicate errors or special cases.
Understanding Distribution:
- The spread and clustering of points in these graphs can provide an understanding of how uniformly or variably the features are distributed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Asteroid Data Exploration Project

Milestone 1:

Link To Colab:

Abstract:

Milestone 2:

Overview

Data Description

Data Heatmap

Data Distribution using `.describe()`

Data Preprocessing

Graph Data Analysis

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

License

harshilxd/Asteroid-Feature-Prediction

Folders and files

Latest commit

History

Repository files navigation

Asteroid Data Exploration Project

Milestone 1:

Link To Colab:

Abstract:

Milestone 2:

Overview

Data Description

Data Heatmap

Data Distribution using .describe()

Data Preprocessing

Graph Data Analysis

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Data Distribution using `.describe()`

Packages