MASTER OF SCIENCE IN DATA SCIENCE

Repository dedicated to projects developed in the Master's Degree in Data Science at the University of Colorado Boulder.

1.Predictive Analysis of ENEM 2022 Performance

Introduction

The Exame Nacional do Ensino Médio (ENEM) holds a pivotal role in shaping Brazil's education landscape, serving as a determinant for student admissions into numerous higher education institutions in Brazil and even abroad. Leveraging the rich dataset that ENEM offers allows for a deep dive into the myriad factors that influence student performance and trends in education.

Objective

Our endeavor circles around the meticulous analysis of the ENEM 2022 dataset to forecast various outcomes using supervised learning techniques. The heart of our analysis can be bifurcated into two critical pathways:

Regression Analysis: Targeted at predicting quantifiable outcomes such as student scores, utilizing regression techniques to find out the underlying patterns and correlations in student performances.
Classification Analysis: Aiming to categorize students into different groups based on a set of predetermined criteria. It could be understanding their likelihood to pursue higher education or categorizing them based on their performance metrics.

By weaving through the extensive data trail of ENEM 2022, we aim to unearth insights that could potentially shape educational policies, assist educational institutions in crafting tailored strategies, and help students in making informed decisions.

Collaboration

We welcome data enthusiasts, educational researchers, and policy makers to collaborate and enrich this project with diverse perspectives and expertise. Together, we can strive to make a substantial impact in the educational sphere through data-driven insights and analysis.

Contribution

Feel free to fork the repository, open issues, and submit pull requests. Your contributions will be duly acknowledged, and we look forward to building a rich repository of analyses and insights centered around the ENEM 2022 dataset.

Link: ENEM 2022 Project

2. BBC News Classification Project: Unsupervised and Supervised Learning Approaches

Kaggle Notebook Analysis - Unsupervised Algorithms in Machine Learning

Introduction

In this project, we undertake a detailed analysis of the BBC News dataset leveraging both unsupervised and supervised learning techniques. The objective is to unravel hidden patterns and extract meaningful insights from the news articles, categorizing them accurately into predefined groups to facilitate efficient information retrieval and enhance reader experience.

Structure of the Repository

To foster a deeper understanding of the news classification process, we have structured the repository into the following sections:

Exploratory Data Analysis (EDA): This section offers a deep dive into the dataset, showcasing detailed text analysis through tokenization, stemming, lemmatization, and visualization of word statistics to provide a comprehensive overview of the data at hand.
Unsupervised Learning Models: Here, we delve into the strategies adopted for unsupervised learning, exploring matrix factorization techniques and setting forth the evaluation metrics to gauge the performance of the built models.
Supervised Learning Model Comparison: A segment dedicated to the comparative study of various supervised learning models, discussing their performance, data efficiency, and addressing potential overfitting issues.
Limitations of sklearn’s Non-negative Matrix Factorization Library: In this part, we explore the limitations encountered while using the sklearn library, suggesting possible improvements and ways to overcome these challenges.

Contribution and Collaboration

We welcome contributions from data enthusiasts, researchers, and policy makers to enrich the project with diverse perspectives and expertise. Feel free to fork the repository, open issues, and submit pull requests. Your contributions will be duly acknowledged, and we look forward to building a rich repository of analyses and insights centered around the BBC News dataset.

Objective

The heart of our analysis can be bifurcated into two critical pathways:

Regression Analysis: Aimed at predicting quantifiable outcomes such as scores in various metrics, utilizing regression techniques to find underlying patterns and correlations in the data.
Classification Analysis: This pathway focuses on categorizing news articles into different groups based on a set of predetermined criteria, enhancing the reader's ability to find articles aligned with their interests.

Future Prospects

This project opens avenues for further research and development in the field of text classification, beckoning contributions that can aid in the crafting of educational strategies and fostering informed decision-making through data-driven insights derived from the analysis of news articles.

Link: BBC News Classification Project

3. Single-Cell Perturbations Analysis Project

Introduction

Welcome to our project repository where we unravel the mysteries of single-cell perturbations in the dynamic field of biotechnology and data science. Leveraging a rich dataset from a Kaggle competition focused on open problems in single-cell perturbations, we aim to foster groundbreaking discoveries in cellular responses to small molecule drug perturbations.

Background

Our project is centered around a novel dataset created using human peripheral blood mononuclear cells (PBMCs). This dataset, derived from a meticulous experiment involving 144 compounds from the LINCS Connectivity Map dataset, offers a rich multi-omic background, providing a fertile ground for establishing biological priors that elucidate the susceptibility of specific genes to perturbation responses in various biological contexts.

Objective

The primary goal of this endeavor is to develop predictive models capable of accurately forecasting cellular responses to small molecule drug perturbations. Our objectives are multi-faceted, encompassing:

Data Exploration and Understanding: Through comprehensive EDA, we aim to unearth underlying patterns and grasp the biological context of the data.
Model Development: Utilizing unsupervised learning techniques, primarily focusing on matrix factorization methods, we aspire to build predictive models and juxtapose them against supervised learning models to discern the strengths and weaknesses of each approach.
Performance Evaluation: This involves rigorous testing of the models to gauge their predictive accuracy and robustness.
Improvement and Optimization: We are committed to continually refining our models, overcoming limitations, and enhancing their predictive accuracy.

Significance

This project stands as a beacon in the confluence of data science and biotechnology, striving to spearhead advancements in medicine through predictive analysis of cellular responses to drug perturbations. It promises not only a substantial learning curve for participants but also a potential pathway to revolutionary discoveries in the medical field.

Structure

Our project is structured into two pivotal phases:

Exploration and Understanding: A deep dive into the dataset to understand the biological context and uncover underlying patterns through detailed EDA.
Model Development and Evaluation: This phase is dedicated to the development of predictive models using unsupervised learning techniques, followed by a rigorous evaluation to ascertain their performance and potential areas for improvement.

Getting Started

To get started with the project, navigate through the repository to find detailed documentation on each phase of the project, including the methodologies employed, the results obtained, and the conclusions derived from the analysis.

We invite collaborators and enthusiasts to join us in this analytical journey to unlock the potential of unsupervised learning in the realm of single-cell perturbations analysis. Feel free to contribute, suggest improvements, and raise issues as we collectively work towards a deeper understanding of cellular responses to drug perturbations.

Link: Open Problems Single Cell Perturbations

4. Histopathologic Cancer Detection Project

Introduction

Welcome to our project repository, where we delve into the realm of histopathologic cancer detection. Leveraging a comprehensive dataset from a Kaggle competition, we aim to make significant strides in the field of digital pathology and machine learning.

Background

Our project revolves around a dataset modified from the PatchCamelyon (PCam) benchmark dataset. This dataset consists of small patches of images taken from larger digital pathology scans. The data includes 220,025 training images and 57,458 test images, focusing on a 32x32 pixel region for binary classification - identifying metastatic cancer.

Objective

The primary goal of this initiative is to develop machine learning algorithms capable of identifying metastatic cancer in small patches of images. Our objectives include:

Data Exploration and Understanding: Through thorough EDA, we intend to discover underlying patterns and understand the medical context of the data.
Model Development: We are focusing on developing various machine learning models, including CNN, VGG-16, ResNet50, and InceptionV3, to evaluate their strengths and weaknesses in tackling this particular problem.
Performance Evaluation: Rigorous testing of the models based on the area under the ROC curve, the evaluation metric for the Kaggle competition.
Improvement and Optimization: We are committed to refining our models through hyperparameter tuning, data augmentation, and potentially ensemble methods to enhance their performance.

Significance

This project serves as an intersection between machine learning and digital pathology, aiming to advance medical diagnoses through the predictive analysis of pathology scans. It offers a steep learning curve for participants and opens doors for innovative solutions in cancer detection.

Structure

Our project is structured into two primary phases:

Exploration and Understanding: A comprehensive analysis of the dataset to understand its medical relevance and discover underlying patterns.
Model Development and Evaluation: This phase is dedicated to the construction and rigorous evaluation of machine learning models tailored to the specific challenges posed by histopathologic cancer detection.

Getting Started

To get involved with the project, navigate through the repository to find in-depth documentation on each phase, including methodologies, results, and conclusions.

We invite enthusiasts and collaborators to join us in this analytical journey to unlock the potential of machine learning in the field of histopathologic cancer detection. Feel free to contribute, suggest improvements, and raise issues as we collaboratively work towards more effective and efficient cancer detection solutions.

Link: Histopathologic Cancer Detection

5. Disaster Tweet Classification Project

Introduction

Welcome to the Disaster Tweet Classification Project repository. This project aims to make a significant impact in the field of Natural Language Processing by tackling the problem of classifying tweets related to real-world disasters. The data for this initiative is derived from a Kaggle competition and serves as an excellent starting point for those interested in NLP.

Background

This project revolves around a dataset comprising 7,613 training tweets and 3,263 test tweets. These tweets are annotated with various features like ID, text content, geographical location, and keyword. The main objective is to classify whether a tweet is related to a real disaster or not.

Objective

The primary goal of this project is to:

Conduct extensive Exploratory Data Analysis (EDA) to understand the underlying patterns and contexts within the tweets.
Develop machine learning models, focusing on NLP techniques, to classify tweets effectively.
Evaluate the performance rigorously based on the F1 score, which is the evaluation metric for the Kaggle competition.
Continuously refine and optimize the model through techniques like hyperparameter tuning.

Significance

The significance of this project lies at the intersection of Natural Language Processing and crisis management. It aims to automate the process of identifying urgent tweets, thus potentially aiding disaster relief organizations and news agencies.

Structure

Our project is structured into two primary phases:

Exploration and Understanding: In-depth analysis and understanding of the dataset, its features, and the challenge it poses.
Model Development and Evaluation: This phase is dedicated to the construction, training, and evaluation of machine learning models suited for this particular NLP challenge.

Getting Started

To dive into this project, navigate through the repository to find comprehensive documentation on each phase, including methodologies, results, and conclusions.

Contributing

We invite contributors and enthusiasts to join us in this analytical journey. Feel free to contribute, suggest improvements, and raise issues as we collaboratively work towards a more effective disaster tweet classification system.

Link: Disaster Tweet Classification Project

6. Monet-Style Image Generation with GANs

Introduction

Welcome to the Monet-Style Image Generation Project. This initiative bridges the gap between art and technology, utilizing Generative Adversarial Networks (GANs) to recreate the distinctive style of Claude Monet. Derived from a Kaggle competition, this project serves as an ideal springboard for those interested in the convergence of machine learning and art.

Background

Our challenge revolves around creating a GAN capable of generating 7,000 to 10,000 images mirroring the style of Monet. This endeavor not only tests the limits of computer vision and generative modeling but also explores the intriguing domain where data science meets art.

Objective

The main goals of this project are to:

Develop and train a GAN that can successfully mimic Monet's artistic style.
Conduct thorough evaluations using the Memorization-informed Fréchet Inception Distance (MiFID) metric to ensure the quality and originality of the generated images.
Explore the creative capacities of GANs in transcending traditional boundaries of art reproduction.

Significance

This project stands at the forefront of artistic innovation, demonstrating the potential of GANs in creating art. It’s a testament to how far the field of computer vision has evolved, showcasing the ability of algorithms to not just replicate but creatively contribute to the world of art.

Structure

Our project is structured into several key phases:

Data Preparation: Understanding and processing the datasets of Monet paintings and photos.
Model Development: Designing and training the generator and discriminator models within the GAN.
Evaluation and Refinement: Rigorously evaluating the generated images using MiFID and refining the model for better performance.

Getting Started

To participate in this project, you can find detailed documentation on model architecture, training procedures, and evaluation methods within our repository. We provide resources and guidance every step of the way.

Contributing

We welcome contributions from enthusiasts, artists, and data scientists alike. Your insights, improvements, and discussions are invaluable as we push the boundaries of what's possible in the fusion of art and machine learning.

Project Link

Link: Monet-Style Image Generation with GANs

7. Invasive Species Detection with Computer Vision Techniques

Introduction

Welcome to the Invasive Species Detection Project using Computer Vision Techniques. This project aims to apply the latest technologies in computer vision and machine learning to address significant ecological challenges, specifically the monitoring of invasive species such as the invasive hydrangea.

Background

The presence of invasive species like kudzu in Georgia and cane toads in over a dozen countries poses a substantial threat to the environment. Effective tracking of these species is essential, yet current methods are costly and inefficient due to the vast area that needs to be covered.

Objective

The main goal of this project is to develop computer vision algorithms that can accurately identify the presence of invasive species in images of forests and foliage, making monitoring more affordable and reliable.

Significance

This project highlights the potential of computer vision in contributing to ecological problem solutions, demonstrating how algorithms can assist in environmental conservation initiatives.

Structure

The project is divided into several key phases:

Data Preparation: Processing of relevant image datasets.
Model Development: Using machine learning techniques to train models capable of identifying invasive species.
Evaluation and Refinement: Rigorous model evaluation using the AUC-ROC metric and continuous refinement for better performance.

Getting Started

You will find detailed documentation on model architecture, training procedures, and evaluation methods in our repository. We provide resources and guidance at every step of the way.

Contributing

We invite enthusiasts, artists, and data scientists to contribute with insights, improvements, and discussions. Your contributions are valuable as we explore the limits of what's possible in the fusion of technology and environmental conservation.

Project Link

Link: Invasive Species Detection with Computer Vision Techniques

8. New York City Shooting Incident Data Analysis

Introduction

This project conducts a detailed descriptive analysis of the New York City Shooting Incident dataset. We use analytical methods to extract insights and identify patterns in the shooting records. The goal is to present the findings during the third week of the Master's in Data Science course at the University of Colorado Boulder.

Repository Structure

The repository is divided into specific sections to ensure a comprehensive understanding of the analysis process:

About the Dataset and Project: Detailed description of the origin and structure of the dataset, including information on how it is updated and maintained.
Dataset Description: Exploration of the metadata to provide a clear summary of each column, helping to understand the variables available for analysis.
Importing, Cleaning, and Organizing: Processes of data importation, handling missing values, adjusting data types, and removing irrelevant columns for analysis.
Visualizations and Analysis: Data aggregation and creation of visualizations to answer preliminary project questions. Includes georeferenced and temporal analyses of the incidents.

Contribution and Collaboration

We encourage the participation of data scientists, criminologists, and policy makers to enrich this project. Contributions can be made by forking the repository, opening issues, and submitting pull requests. We value all contributions and are excited to collaborate on developing a robust analytical tool.

Objectives

The project focuses on two main objectives:

Pattern Discovery: Identification of patterns and relationships in the data that may guide crime prevention strategies.
Considerations for Predictive Modeling: Although we have not developed a predictive model, we identified the presence of the STATISTICAL_MURDER_FLAG variable, which could be used as a response variable in future predictive analyses.

Future Prospects

This project provides a foundation for future research in public safety and criminal analysis. We look forward to subsequent advancements that can refine predictive modeling techniques, contributing to urban safety through well-informed policies and strategic planning.

For more information and access to the dataset, visit the dataset link: NYPD Shooting Incident Data (Historic).

Project Link

Link: New York City Shooting Incident Data Analysis.

9. Analysis of the Impacts of COVID-19 on Global Populations

Introduction

This academic project conducts an in-depth analysis of COVID-19 data, focusing on daily records of confirmed cases and virus-related deaths. The data, collected and consolidated from various global sources, are crucial for understanding the pandemic's impact, particularly in countries with populations larger than Brazil. This project is part of the Master's in Data Science program at the University of Colorado Boulder and aims to present findings in the third week of the course.

Repository Structure

The repository is organized into specific sections to facilitate a comprehensive understanding of the analysis process:

About the Dataset and Project: Detailed description of the dataset's origin and structure, including information on updates and maintenance.
Dataset Description: Exploration of the metadata to provide a clear summary of each column, aiding in understanding the variables available for analysis.
Importing, Cleaning, and Organizing: Processes of data importation, handling missing values, adjusting data types, and removing irrelevant columns for analysis.
Visualizations and Analysis: Data aggregation and creation of visualizations to answer preliminary project questions, including georeferenced and temporal analyses of the incidents.

Contribution and Collaboration

We encourage participation from data scientists, epidemiologists, and policy makers to enrich this project. Contributions can be made by forking the repository, opening issues, and submitting pull requests. We value all contributions and are excited to collaborate on developing a robust analytical tool.

Objectives

The project focuses on two main objectives:

Pattern Discovery: Identification of patterns and relationships in the data that may guide public health strategies.
Considerations for Predictive Modeling: While a predictive model has not been developed yet, the analysis could provide insights for future modeling efforts.

Future Prospects

This project lays the groundwork for future research in public health and epidemiology. We look forward to subsequent advancements that can refine predictive modeling techniques, contributing to global health safety through well-informed policies and strategic planning.

Data Description

The datasets used in this study include several key variables, such as:

Countries: The name of the country.
Lat and Long: Geographic coordinates for each location.
Date: The date of the recorded data.
Cases: Number of confirmed cases.
Deaths: Number of confirmed deaths.

Data Sources

The data for this study were extracted from the following sources:

Daily data on confirmed cases and deaths: Various global repositories.
Country population reference: Relevant global datasets.

Methodology

This project employs a structured methodology to analyze COVID-19 data, focusing on the following steps:

Sample: Selection of countries with populations larger than Brazil to ensure a representative demographic scale.
Explore: Examination of data through visualizations and descriptive statistics to identify trends and anomalies.
Modify: Data wrangling to standardize datasets for accurate analysis, including handling missing values and data inconsistencies.
Model: Application of statistical and machine learning models to estimate trends and predict future scenarios.
Assess: Evaluation of model accuracy and reliability through cross-validation and analysis of results to determine public health implications.

This study aims not only to understand but also to systematically document the patterns of dissemination and impact of the virus, contributing to future research and interventions in global public health.

For more information and access to the dataset, visit the dataset link: COVID-19 Data Repository.

Project Link

Link: Analysis of the Impact of COVID-19

10. Analysis of Data Scientist Job Market in the UK (2024)

Introduction

This academic project delves into the data science job market in the United Kingdom, focusing on key trends such as salary variation, required skills, and the influence of company ratings on job attractiveness. The dataset, sourced from Glassdoor, provides detailed information on job listings, including company names, job titles, salary ranges, and required skills. This project is part of the Master's in Data Science curriculum and is aligned with the objectives of exploring the trends in demand for data scientists, both regionally and skill-wise. The findings will be presented as part of the course's final project evaluation.

Repository Structure

The repository is divided into distinct sections to enable a clear understanding of the project and its objectives:

About the Dataset and Project: This section provides a thorough description of the dataset, including its origin, key attributes, and maintenance.
Dataset Description: Explores the metadata to clarify the variables available for analysis, such as company ratings, job locations, salary ranges, and required skills.
Data Cleaning and Organization: Details the processes of importing, cleaning, and organizing the dataset, including how missing values and data inconsistencies were addressed.
Visualizations and Analysis: Displays a variety of visualizations and analyses that answer the key project questions, focusing on salary trends, skill requirements, and company ratings.

Contribution and Collaboration

Contributions from the data science community are highly encouraged. Data scientists, HR professionals, and students are welcome to fork the repository, open issues, and submit pull requests to improve the project's insights. Collaboration is highly appreciated to enhance the understanding of the evolving data science job market in the UK.

Objectives

This project focuses on the following main objectives:

Analyze Salary Variation: Understand how salaries for data science roles vary across different cities and regions in the UK.
Identify In-Demand Skills: Discover the most sought-after skills for data scientists and their impact on salaries.
Assess Company Ratings and Remote Opportunities: Explore whether highly-rated companies offer higher salaries or more opportunities for remote work.

Future Prospects

This project lays a strong foundation for future analysis of the data science job market, particularly as it evolves post-pandemic. The visualizations and insights could serve as a valuable resource for job seekers and employers alike. Future enhancements could involve expanding the dataset to include international job markets and further refining the predictive models for salary trends and skill demand.

Data Description

The dataset used in this project contains several key variables, such as:

Company: The name of the company offering the job position.
Company Score: The rating given to the company by employees.
Job Title: The specific job role being advertised.
Date: The date the job listing was posted.
Salary: The estimated salary range for the position.
Skills: A list of required skills for the job role.
Estimation Type: Indicates whether the salary was estimated by the company or the job listing platform.
Remote: Specifies whether the job is remote or on-site.
City and Country: The location of the job.

Data Sources

The data for this study were extracted from the following source:

Data Scientist Job Roles in the UK Dataset

Methodology

This project follows a structured methodology to analyze the UK job market for data scientists:

Sample: Selection of job listings specific to the UK to reflect regional trends and market conditions.
Explore: Utilize descriptive statistics and visualizations to identify trends in salary, skills, and company ratings.
Modify: Clean and preprocess the dataset to ensure accurate and reliable analysis, including addressing missing values and formatting inconsistencies.
Model: Implement data visualizations and predictive analyses to forecast salary trends and demand for specific skills.
Assess: Evaluate the visualizations through user feedback and refinement to ensure clarity and usability.

This analysis aims to provide a comprehensive view of the job market for data scientists in the UK, guiding career planning and recruitment strategies.

Project Links

11. Predictive Analysis of Supervised Text Classification for Marketing Analytics

Introduction

Contextual advertising is an essential component of digital marketing, enabling brands to place their ads in environments that align with their target audience’s interests. In this project, we build a deep learning model capable of identifying news articles that mention health and wellness, which can be used by brands like Theragun to optimize their media buying strategies.

By leveraging supervised learning techniques, we aim to create an algorithm that can effectively classify headlines, making it possible to identify relevant content across the web and maximize the efficiency of contextual advertising efforts.

Objective

Our analysis revolves around the use of the HuffPost headlines dataset and focuses on two critical paths:

Classification Analysis: Predicting whether a news headline is related to health and wellness content (combining the categories Healthy Living and Wellness), using supervised deep learning models.

By processing and analyzing this rich dataset, we aim to generate insights and deliver an accurate model that supports marketing decisions, allowing brands to find the right context for their advertisements.

Collaboration

We welcome marketers, data scientists, machine learning enthusiasts, and digital strategists to collaborate and enrich this project. Through collaboration, we can improve contextual advertising strategies and build stronger machine learning models.

Contribution

Feel free to fork the repository, open issues, and submit pull requests. Your contributions will be duly acknowledged, and we look forward to building a valuable tool for contextual advertising together.

The Data

This dataset was originally made available by Rishabh Misra, containing around 200,000 news headlines published by HuffPost between 2012 and 2018. Each headline is classified into categories such as:

POLITICS
WELLNESS
ENTERTAINMENT
TRAVEL
STYLE & BEAUTY
PARENTING
HEALTHY LIVING
FOOD & DRINK
BUSINESS
SPORTS
SCIENCE
and others.

Our task focused on building a binary classifier to predict whether a headline relates to health and wellness content.

Dr. Vargo's Benchmarks

The benchmark provided by Dr. Vargo for this challenge was:

Metric	Precision	Recall	F1-Score
Class 0	0.88	0.84	0.86
Class 1	0.85	0.89	0.87
Accuracy			0.86
Macro Avg	0.86	0.86	0.86
Weighted Avg	0.86	0.86	0.86

My Results

After model training and optimization, the final evaluation metrics were:

Category	Precision	Recall	F1-Score	Support
Healthy Living	0.80	0.62	0.70	1,375
Wellness	0.86	0.94	0.90	3,530
Accuracy			0.85	4,905
Macro Avg	0.83	0.78	0.80	4,905
Weighted Avg	0.85	0.85	0.84	4,905

The results demonstrate a strong ability to detect wellness-related headlines with high recall and precision. There is room for improvement in healthy living detection, especially increasing recall through further data tuning or augmentation.

Project Link

Link: Supervised Text Classification for Marketing Analytics

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
Analysis of the Impacts of COVID-19		Analysis of the Impacts of COVID-19
BBC News Classification		BBC News Classification
CNN Cancer Detection Project		CNN Cancer Detection Project
Data Scientist Job Market UK		Data Scientist Job Market UK
ENEM 2022 - Project		ENEM 2022 - Project
GAN Project		GAN Project
Invasive Species Monitoring		Invasive Species Monitoring
NLP Disaster Tweets Kaggle Mini-Project		NLP Disaster Tweets Kaggle Mini-Project
NYPD Shotting Incidents Project		NYPD Shotting Incidents Project
Open Problems Single Cell Perturbation - Project		Open Problems Single Cell Perturbation - Project
Supervised Text Classification for Marketing Analytics		Supervised Text Classification for Marketing Analytics
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

License

willianpina/MSDS_Colorado_Boulder

Folders and files

Latest commit

History

Repository files navigation

MASTER OF SCIENCE IN DATA SCIENCE

1.Predictive Analysis of ENEM 2022 Performance

Introduction

Objective

Collaboration

Contribution

2. BBC News Classification Project: Unsupervised and Supervised Learning Approaches

Kaggle Notebook Analysis - Unsupervised Algorithms in Machine Learning

Introduction

Structure of the Repository

Contribution and Collaboration

Objective

Future Prospects

3. Single-Cell Perturbations Analysis Project

Introduction

Background

Objective

Significance

Structure

Getting Started

4. Histopathologic Cancer Detection Project

Introduction

Background

Objective

Significance

Structure

Getting Started

5. Disaster Tweet Classification Project

Introduction

Background

Objective

Significance

Structure

Getting Started

Contributing

6. Monet-Style Image Generation with GANs

Introduction

Background

Objective

Significance

Structure

Getting Started

Contributing

Project Link

7. Invasive Species Detection with Computer Vision Techniques

Introduction

Background

Objective

Significance

Structure

Getting Started

Contributing

Project Link

8. New York City Shooting Incident Data Analysis

Introduction

Repository Structure

Contribution and Collaboration

Objectives

Future Prospects

Project Link

9. Analysis of the Impacts of COVID-19 on Global Populations

Introduction

Repository Structure

Contribution and Collaboration

Objectives

Future Prospects

Data Description

Data Sources

Methodology

Project Link

10. Analysis of Data Scientist Job Market in the UK (2024)

Introduction

Repository Structure

Contribution and Collaboration

Packages