Skip to content

Explore cryptocurrency market trends with Python using unsupervised learning techniques. Using Jupyter Notebooks to implement K-means clustering and Principal Component Analysis (PCA) to analyze and predict price trends of cryptocurrencies over 24-hour and 7-day periods.

Notifications You must be signed in to change notification settings

SteveTuttle/crypto-price-analysis

Repository files navigation

crypto-price-analysis

UNC_data_bootcamp_module_19

Challenge Description

Background

For this challenge, we need to use our knowledge of Python via a Jupyter Notebook and Unsupervised Learning to predict if cryptocurrencies are affected by 24-hour or 7-day price changes.

Deliverables

Our goal for this challenge is detailed a requires us to follow a series of steps broken down within six greater sections. The Challenge Instuctions are outlined within each section, and the section are to be completed as follows:

  1. Prepare the Data
  2. Find the Best Value for k Using the Original Scaled DataFrame
  3. Cluster Cryptocurrencies with K-means Using the Original Scaled Data
  4. Optimize Clusters with Principal Component Analysis
  5. Find the Best Value for k Using the PCA Data
  6. Cluster Cryptocurrencies with K-means Using the PCA Data

Section-1: Prepare the Data

To start off I needed to rename the Crypto_Clustering_starter_code.ipynb file as Crypto_Clustering_SDT.ipynb. I viewed the file crypto_market_data.csv separately first to understand the data scource better, then I loaded the crypto_market_data.csv into a DataFrame. From here I am able to acquire the metrics and plot the data needed complete the challenge per the instructions:

  • Use the StandardScaler() module from scikit-learn to normalize the data from the CSV file.
  • Create a DataFrame with the scaled data and set the "coin_id" index from the original DataFrame as the index for the new DataFrame.

Section-2: Find the Best Value for k Using the Original Scaled DataFrame

In this section I will use the elbow method to find the best value for k per the instructions from the challenge:

  • Create a list with the number of k values from 1 to 11.
  • Create an empty list to store the inertia values.
  • Create a for loop to compute the inertia with each possible value of k.
  • Create a dictionary with the data to plot the elbow curve.
  • Plot a line chart with all the inertia values computed with the different values of k to visually identify the optimal value for k.
  • Answer the following question in your notebook: What is the best value for k?

DF Elbow Plot

Section-3: Cluster Cryptocurrencies with K-means Using the Original Scaled Data

For this section I will use the following steps per the challenge instructions to cluster the cryptocurrencies for the best value for k of the original scaled data:

  • Initialize the K-means model with the best value for k.
  • Fit the K-means model using the original scaled DataFrame.
  • Predict the clusters to group the cryptocurrencies using the original scaled DataFrame.
  • Create a copy of the original data and add a new column with the predicted clusters.
  • Create a scatter plot using hvPlot as follows:
    • Set the x-axis as "PC1" and the y-axis as "PC2".
    • Color the graph points with the labels found using K-means.
    • Add the "coin_id" column in the hover_cols parameter to identify the cryptocurrency represented by each data point.

Market Cluster

Section-4: Optimize Clusters with Principal Component Analysis

In this section I will further refine the clusters using Principal Component Analysis (PCA) per the challenge instructions:

  • Using the original scaled DataFrame, perform a PCA and reduce the features to three principal components.
  • Retrieve the explained variance to determine how much information can be attributed to each principal component and then answer the following question in your notebook:
    • What is the total explained variance of the three principal components?
  • Create a new DataFrame with the PCA data and set the "coin_id" index from the original DataFrame as the index for the new DataFrame.

Section-5: Find the Best Value for k Using the PCA Data

Following challenge instructions I will again use the elbow method on the PCA data to find the best value for k by:

  • Create a list with the number of k-values from 1 to 11.
  • Create an empty list to store the inertia values.
  • Create a for loop to compute the inertia with each possible value of k.
  • Create a dictionary with the data to plot the Elbow curve.
  • Plot a line chart with all the inertia values computed with the different values of k to visually identify the optimal value for k.
  • Answer the following question in your notebook:
    • What is the best value for k when using the PCA data?
    • Does it differ from the best k value found using the original data?

PCA Elbow Plot

Section-6: Cluster Cryptocurrencies with K-means Using the PCA Data

Finally, I will complete the following steps per the challenge instructions to cluster the cryptocurrencies for the best value for k on the PCA data:

  • Initialize the K-means model with the best value for k.
  • Fit the K-means model using the PCA data.
  • Predict the clusters to group the cryptocurrencies using the PCA data.
  • Create a copy of the DataFrame with the PCA data and add a new column to store the predicted clusters.
  • Create a scatter plot using hvPlot as follows:
    • Set the x-axis as "price_change_percentage_24h" and the y-axis as "price_change_percentage_7d".
    • Color the graph points with the labels found using K-means.
    • Add the "coin_id" column in the hover_cols parameter to identify the cryptocurrency represented by each data point.
  • Answer the following question:
    • What is the impact of using fewer features to cluster the data using K-Means?

Market Segment

Resources

Bootcamp References --- Update for this Challenge!!

Module 19 Instructions

starter_code

  • Crypto_Clustering_starter_code.ipynb

Resources

  • crypto_market_data.csv

Special Thanks: (for Challenge overview discussions during BootCamp office hours)

  • Jamie Miller
  • Mounika Mamindla
  • Lisa Shemanciik  

External References

(where possible will provide link to website)

About

Explore cryptocurrency market trends with Python using unsupervised learning techniques. Using Jupyter Notebooks to implement K-means clustering and Principal Component Analysis (PCA) to analyze and predict price trends of cryptocurrencies over 24-hour and 7-day periods.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published