crypto-price-analysis

UNC_data_bootcamp_module_19

Challenge Description

Background

For this challenge, we need to use our knowledge of Python via a Jupyter Notebook and Unsupervised Learning to predict if cryptocurrencies are affected by 24-hour or 7-day price changes.

Deliverables

Our goal for this challenge is detailed a requires us to follow a series of steps broken down within six greater sections. The Challenge Instuctions are outlined within each section, and the section are to be completed as follows:

Prepare the Data
Find the Best Value for k Using the Original Scaled DataFrame
Cluster Cryptocurrencies with K-means Using the Original Scaled Data
Optimize Clusters with Principal Component Analysis
Find the Best Value for k Using the PCA Data
Cluster Cryptocurrencies with K-means Using the PCA Data

Section-1: Prepare the Data

To start off I needed to rename the Crypto_Clustering_starter_code.ipynb file as Crypto_Clustering_SDT.ipynb. I viewed the file crypto_market_data.csv separately first to understand the data scource better, then I loaded the crypto_market_data.csv into a DataFrame. From here I am able to acquire the metrics and plot the data needed complete the challenge per the instructions:

Use the StandardScaler() module from scikit-learn to normalize the data from the CSV file.
Create a DataFrame with the scaled data and set the "coin_id" index from the original DataFrame as the index for the new DataFrame.

Section-2: Find the Best Value for k Using the Original Scaled DataFrame

In this section I will use the elbow method to find the best value for k per the instructions from the challenge:

Create a list with the number of k values from 1 to 11.
Create an empty list to store the inertia values.
Create a for loop to compute the inertia with each possible value of k.
Create a dictionary with the data to plot the elbow curve.
Plot a line chart with all the inertia values computed with the different values of k to visually identify the optimal value for k.
Answer the following question in your notebook: What is the best value for k?

Section-3: Cluster Cryptocurrencies with K-means Using the Original Scaled Data

For this section I will use the following steps per the challenge instructions to cluster the cryptocurrencies for the best value for k of the original scaled data:

Initialize the K-means model with the best value for k.
Fit the K-means model using the original scaled DataFrame.
Predict the clusters to group the cryptocurrencies using the original scaled DataFrame.
Create a copy of the original data and add a new column with the predicted clusters.
Create a scatter plot using hvPlot as follows:
- Set the x-axis as "PC1" and the y-axis as "PC2".
- Color the graph points with the labels found using K-means.
- Add the "coin_id" column in the hover_cols parameter to identify the cryptocurrency represented by each data point.

Section-4: Optimize Clusters with Principal Component Analysis

In this section I will further refine the clusters using Principal Component Analysis (PCA) per the challenge instructions:

Using the original scaled DataFrame, perform a PCA and reduce the features to three principal components.
Retrieve the explained variance to determine how much information can be attributed to each principal component and then answer the following question in your notebook:
- What is the total explained variance of the three principal components?
Create a new DataFrame with the PCA data and set the "coin_id" index from the original DataFrame as the index for the new DataFrame.

Section-5: Find the Best Value for k Using the PCA Data

Following challenge instructions I will again use the elbow method on the PCA data to find the best value for k by:

Create a list with the number of k-values from 1 to 11.
Create an empty list to store the inertia values.
Create a for loop to compute the inertia with each possible value of k.
Create a dictionary with the data to plot the Elbow curve.
Plot a line chart with all the inertia values computed with the different values of k to visually identify the optimal value for k.
Answer the following question in your notebook:
- What is the best value for k when using the PCA data?
- Does it differ from the best k value found using the original data?

Section-6: Cluster Cryptocurrencies with K-means Using the PCA Data

Finally, I will complete the following steps per the challenge instructions to cluster the cryptocurrencies for the best value for k on the PCA data:

Initialize the K-means model with the best value for k.
Fit the K-means model using the PCA data.
Predict the clusters to group the cryptocurrencies using the PCA data.
Create a copy of the DataFrame with the PCA data and add a new column to store the predicted clusters.
Create a scatter plot using hvPlot as follows:
- Set the x-axis as "price_change_percentage_24h" and the y-axis as "price_change_percentage_7d".
- Color the graph points with the labels found using K-means.
- Add the "coin_id" column in the hover_cols parameter to identify the cryptocurrency represented by each data point.
Answer the following question:
- What is the impact of using fewer features to cluster the data using K-Means?

Resources

Bootcamp References --- Update for this Challenge!!

Module 19 Instructions

starter_code

Crypto_Clustering_starter_code.ipynb

Resources

crypto_market_data.csv

Special Thanks: (for Challenge overview discussions during BootCamp office hours)

Jamie Miller
Mounika Mamindla
Lisa Shemanciik

External References

(where possible will provide link to website)

pandas documentation
hvplot documentation
scikit-learn documenation
YouTube (various videos)
Google
GitHub

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Images		Images
Resources		Resources
.DS_Store		.DS_Store
.gitignore		.gitignore
Crypto_Clustering_SDT.ipynb		Crypto_Clustering_SDT.ipynb
Crypto_Clustering_starter_code.ipynb		Crypto_Clustering_starter_code.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

crypto-price-analysis

Challenge Description

Background

Deliverables

Section-1: Prepare the Data

Section-2: Find the Best Value for k Using the Original Scaled DataFrame

Section-3: Cluster Cryptocurrencies with K-means Using the Original Scaled Data

Section-4: Optimize Clusters with Principal Component Analysis

Section-5: Find the Best Value for k Using the PCA Data

Section-6: Cluster Cryptocurrencies with K-means Using the PCA Data

Resources

Bootcamp References --- Update for this Challenge!!

External References

About

Uh oh!

Releases

Packages

Languages

SteveTuttle/crypto-price-analysis

Folders and files

Latest commit

History

Repository files navigation

crypto-price-analysis

Challenge Description

Background

Deliverables

Section-1: Prepare the Data

Section-2: Find the Best Value for k Using the Original Scaled DataFrame

Section-3: Cluster Cryptocurrencies with K-means Using the Original Scaled Data

Section-4: Optimize Clusters with Principal Component Analysis

Section-5: Find the Best Value for k Using the PCA Data

Section-6: Cluster Cryptocurrencies with K-means Using the PCA Data

Resources

Bootcamp References --- Update for this Challenge!!

External References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages