This repo contains all the files needed to deliver the Team Data Science Process (TDSP) course. This course builds upon the 3-hour Data Science On-Ramp course by closely examining the activities described by Microsoft's Team Data Science Process. The TDSP outlines a development lifecycle methodology for implementing typical machine learning solutions, and is based on the Cross Industry Standard Process for Data Mining (CRISP-DM) that was first published in 1996. This course takes a much deeper look at feature selection which describes how to determine the features that will have the greatest influence on predicting the outcome (response), and dimensionality reduction which describes how to extract the optimal discriminiatory power (influence) from hyper-parametric data (data having potentially millions of features). The course also looks more closely at the process for optimimizing the performance of machine learning models. This involves selecting the statistical method (machine learning algorithm) that will produce the most accurate predictions, and hyperparamter tuning which involves determining the most effective combination of the algorithm's configuration settings. Also examined is how to use the Pipeline construct to serialize all these steps into a high-performance mechanism that will achieve the best result in the shortest time while consuming the least amount of computational resources.
-
Notifications
You must be signed in to change notification settings - Fork 0
jtupitza-msft/DataScience-OnRamp
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published