Skip to content

jtupitza-msft/DataScience-OnRamp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Team-Data-Science-Process

This repo contains all the files needed to deliver the Team Data Science Process (TDSP) course. This course builds upon the 3-hour Data Science On-Ramp course by closely examining the activities described by Microsoft's Team Data Science Process. The TDSP outlines a development lifecycle methodology for implementing typical machine learning solutions, and is based on the Cross Industry Standard Process for Data Mining (CRISP-DM) that was first published in 1996. This course takes a much deeper look at feature selection which describes how to determine the features that will have the greatest influence on predicting the outcome (response), and dimensionality reduction which describes how to extract the optimal discriminiatory power (influence) from hyper-parametric data (data having potentially millions of features). The course also looks more closely at the process for optimimizing the performance of machine learning models. This involves selecting the statistical method (machine learning algorithm) that will produce the most accurate predictions, and hyperparamter tuning which involves determining the most effective combination of the algorithm's configuration settings. Also examined is how to use the Pipeline construct to serialize all these steps into a high-performance mechanism that will achieve the best result in the shortest time while consuming the least amount of computational resources.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published