Skip to content

Commit

Permalink
Scikit-learn spelling audit
Browse files Browse the repository at this point in the history
  • Loading branch information
jlooper committed Jun 17, 2021
1 parent 8752679 commit b7c3a8b
Show file tree
Hide file tree
Showing 19 changed files with 44 additions and 44 deletions.
4 changes: 2 additions & 2 deletions 1-Introduction/1-intro-to-ML/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ Before starting with this curriculum, you need to have your computer set up and
- **Learn Python**. It's also recommended to have a basic understanding of [Python](https://docs.microsoft.com/learn/paths/python-language/?WT.mc_id=academic-15963-cxa), a programming language useful for data scientists that we use in this course.
- **Learn Node.js and JavaScript**. We also use JavaScript a few times in this course when building web apps, so you will need to have [node](https://nodejs.org) and [npm](https://www.npmjs.com/) installed, as well as [Visual Studio Code](https://code.visualstudio.com/) available for both Python and JavaScript development.
- **Create a GitHub account**. Since you found us here on [GitHub](https://github.com), you might already have an account, but if not, create one and then fork this curriculum to use on your own. (Feel free to give us a star, too 😊)
- **Explore Scikit-Learn**. Familiarize yourself with [Scikit-Learn]([https://scikit-learn.org/stable/user_guide.html), a set of ML libraries that we reference in these lessons.
- **Explore Scikit-learn**. Familiarize yourself with [Scikit-learn]([https://scikit-learn.org/stable/user_guide.html), a set of ML libraries that we reference in these lessons.

### What is machine learning?

Expand All @@ -45,7 +45,7 @@ Although the terms can be confused, machine learning (ML) is an important subset
## What you will learn in this course

In this curriculum, we are going to cover only the core concepts of machine learning that a beginner must know. We cover what we call 'Classical machine learning' primarily using Scikit-Learn, an excellent library many students use to learn the basics. To understand broader concepts of artificial intelligence or deep learning, a strong fundamental knowledge of machine learning is indispensable, and so we would like to offer it here.
In this curriculum, we are going to cover only the core concepts of machine learning that a beginner must know. We cover what we call 'Classical machine learning' primarily using Scikit-learn, an excellent library many students use to learn the basics. To understand broader concepts of artificial intelligence or deep learning, a strong fundamental knowledge of machine learning is indispensable, and so we would like to offer it here.

You will additionally learn the basics of Regression, Classification, Clustering, Natural Language Processing, Time Series Forecasting, and Reinforcement Learning, as well as real-world applications, the history of ML, ML and Fairness, and how to use your model in web apps.

Expand Down
10 changes: 5 additions & 5 deletions 2-Regression/1-Tools/assignment.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
# Regression with Scikit-Learn
# Regression with Scikit-learn

## Instructions

Take a look at the [Linnerud dataset](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_linnerud.html#sklearn.datasets.load_linnerud) in Scikit-Learn. This dataset has multiple [targets](https://scikit-learn.org/stable/datasets/toy_dataset.html#linnerrud-dataset): 'It consists of three excercise (data) and three physiological (target) variables collected from twenty middle-aged men in a fitness club'.
Take a look at the [Linnerud dataset](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_linnerud.html#sklearn.datasets.load_linnerud) in Scikit-learn. This dataset has multiple [targets](https://scikit-learn.org/stable/datasets/toy_dataset.html#linnerrud-dataset): 'It consists of three excercise (data) and three physiological (target) variables collected from twenty middle-aged men in a fitness club'.

In your own words, describe how to create a Regression model that would plot the relationship between the waistline and how many situps are accomplished. Do the same for the other datapoints in this dataset.

## Rubric

| Criteria | Exemplary | Adequate | Needs Improvement |
| -------- | --------- | -------- | ----------------- |
| Submit a descriptive paragraph | Well-written paragraph is submitted | A few sentences are submitted | No description is supplied |
| Criteria | Exemplary | Adequate | Needs Improvement |
| ------------------------------ | ----------------------------------- | ----------------------------- | -------------------------- |
| Submit a descriptive paragraph | Well-written paragraph is submitted | A few sentences are submitted | No description is supplied |
6 changes: 3 additions & 3 deletions 2-Regression/2-Data/README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
# Build a Regression Model using Scikit-Learn: Prepare and Visualize Data
# Build a regression model using Scikit-learn: prepare and visualize data

> ![Data Vizualization Infographic](./images/data-visualization.png)
> ![Data visualization infographic](./images/data-visualization.png)
> Infographic by [Dasani Madipalli](https://twitter.com/dasani_decoded)
## [Pre-lecture quiz](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/11/)

## Introduction

Now that you are set up with the tools you need to start tackling machine learning model-building with Scikit-Learn, you are ready to start asking questions of your data. As you work with data and apply ML solutions, it's very important to understand how to ask the right question to properly unlock the potentials of your dataset.
Now that you are set up with the tools you need to start tackling machine learning model building with Scikit-learn, you are ready to start asking questions of your data. As you work with data and apply ML solutions, it's very important to understand how to ask the right question to properly unlock the potentials of your dataset.

In this lesson, you will learn:

Expand Down
8 changes: 4 additions & 4 deletions 2-Regression/3-Linear/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Build a Regression Model using Scikit-Learn: Regression Two Ways
# Build a Regression Model using Scikit-learn: Regression Two Ways

![Linear vs Polynomial Regression Infographic](./images/linear-polynomial.png)
> Infographic by [Dasani Madipalli](https://twitter.com/dasani_decoded)
Expand Down Expand Up @@ -43,7 +43,7 @@ As you learned in Lesson 1, the goal of a linear regression exercise is to be ab

Now that you have an understanding of the math behind this exercise, create a Regression model to see if you can predict which package of pumpkins will have the best pumpkin prices. Someone buying pumpkins for a holiday pumpkin patch might want this information to be able to optimize their purchases of pumpkin packages for the patch.

Since you'll use Scikit-Learn, there's no reason to do this by hand (although you could!). In the main data-processing block of your lesson notebook, add a library from Scikit-Learn to automatically convert all string data to numbers:
Since you'll use Scikit-learn, there's no reason to do this by hand (although you could!). In the main data-processing block of your lesson notebook, add a library from Scikit-learn to automatically convert all string data to numbers:

```python
from sklearn.preprocessing import LabelEncoder
Expand All @@ -52,7 +52,7 @@ new_pumpkins.iloc[:, 0:-1] = new_pumpkins.iloc[:, 0:-1].apply(LabelEncoder().fit
new_pumpkins.iloc[:, 0:-1] = new_pumpkins.iloc[:, 0:-1].apply(LabelEncoder().fit_transform)
```

If you look at the new_pumpkins dataframe now, you see that all the strings are now numeric. This makes it harder for you to read but much more intelligible for Scikit-Learn!
If you look at the new_pumpkins dataframe now, you see that all the strings are now numeric. This makes it harder for you to read but much more intelligible for Scikit-learn!

Now you can make more educated decisions (not just based on eyeballing a scatterplot) about the data that is best suited to regression.

Expand Down Expand Up @@ -189,7 +189,7 @@ X=poly_pumpkins.iloc[:,3:4].values
y=poly_pumpkins.iloc[:,4:5].values
```

Scikit-Learn includes a helpful API for building polynomial regression models - the `make_pipeline` [API](https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.make_pipeline.html?highlight=pipeline#sklearn.pipeline.make_pipeline). A 'pipeline' is created which is a chain of estimators. In this case, the pipeline includes Polynomial Features, or predictions that form a nonlinear path.
Scikit-learn includes a helpful API for building polynomial regression models - the `make_pipeline` [API](https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.make_pipeline.html?highlight=pipeline#sklearn.pipeline.make_pipeline). A 'pipeline' is created which is a chain of estimators. In this case, the pipeline includes Polynomial Features, or predictions that form a nonlinear path.

```python
from sklearn.preprocessing import PolynomialFeatures
Expand Down
2 changes: 1 addition & 1 deletion 2-Regression/3-Linear/assignment.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## Instructions

In this lesson you were shown how to build a model using both Linear and Polynomial Regression. Using this knowledge, find a dataset or use one of Scikit-Learn's built-in sets to build a fresh model. Explain in your notebook why you chose the technique you did, and demonstrate your model's accuracy. If it is not accurate, explain why.
In this lesson you were shown how to build a model using both Linear and Polynomial Regression. Using this knowledge, find a dataset or use one of Scikit-learn's built-in sets to build a fresh model. Explain in your notebook why you chose the technique you did, and demonstrate your model's accuracy. If it is not accurate, explain why.

## Rubric

Expand Down
4 changes: 2 additions & 2 deletions 2-Regression/4-Logistic/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@ Now that we have an idea of the relationship between the binary categories of co
## Build your model

Building a model to find these binary classification is surprisingly straightforward in Scikit-Learn.
Building a model to find these binary classification is surprisingly straightforward in Scikit-learn.

Select the variables you want to use in your classification model and split the training and test sets:

Expand Down Expand Up @@ -240,7 +240,7 @@ Using Seaborn again, plot the model's [Receiving Operating Characteristic](https

![ROC](./images/ROC.png)

Finally, use Scikit-Learn's [`roc_auc_score` API](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html?highlight=roc_auc#sklearn.metrics.roc_auc_score) to compute the actual 'Area Under the Curve' (AUC):
Finally, use Scikit-learn's [`roc_auc_score` API](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html?highlight=roc_auc#sklearn.metrics.roc_auc_score) to compute the actual 'Area Under the Curve' (AUC):

```python
auc = roc_auc_score(y_test,y_scores[:,1])
Expand Down
2 changes: 1 addition & 1 deletion 2-Regression/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ The lessons in this section cover types of Regression in the context of machine

In this series of lessons, you'll discover the difference between Linear vs. Logistic Regression, and when you should use one or the other.

In this group of lessons, you will get set up to begin machine learning tasks, including configuring Visual Studio code to manage notebooks, the common environment for data scientists. You will discover Scikit-Learn, a library for machine learning, and you will build your first models, focusing on Regression models in this chapter.
In this group of lessons, you will get set up to begin machine learning tasks, including configuring Visual Studio code to manage notebooks, the common environment for data scientists. You will discover Scikit-learn, a library for machine learning, and you will build your first models, focusing on Regression models in this chapter.

> There are useful low-code tools that can help you learn about working with Regression models. Try [Azure ML for this task](https://docs.microsoft.com/learn/modules/create-regression-model-azure-machine-learning-designer/?WT.mc_id=academic-15963-cxa)
Expand Down
2 changes: 1 addition & 1 deletion 3-Web-App/1-Web-App/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ ufos = ufos[(ufos['Seconds'] >= 1) & (ufos['Seconds'] <= 60)]
ufos.info()
```

Next, import Scikit-Learn's LabelEncoder library to convert the text values for countries to a number.
Next, import Scikit-learn's LabelEncoder library to convert the text values for countries to a number.

✅ LabelEncoder encodes data alphabetically

Expand Down
2 changes: 1 addition & 1 deletion 3-Web-App/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Build a Web App to use your ML Model

In this section of the curriculum, you will be introduced to an applied ML topic: how to save your Scikit-Learn model as a file that can be used to make predictions within a web application. Once the model is saved, you'll learn how to use it in a web app built in Flask. You'll first create a model using some data that's all about UFO sightings! Then, you'll build a web app that will allow you to input a number of seconds with a latitude and a longitude value to predict which country reported seeing a UFO.
In this section of the curriculum, you will be introduced to an applied ML topic: how to save your Scikit-learn model as a file that can be used to make predictions within a web application. Once the model is saved, you'll learn how to use it in a web app built in Flask. You'll first create a model using some data that's all about UFO sightings! Then, you'll build a web app that will allow you to input a number of seconds with a latitude and a longitude value to predict which country reported seeing a UFO.

## Lessons

Expand Down
4 changes: 2 additions & 2 deletions 4-Classification/1-Introduction/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,13 +27,13 @@ Derived from [statistics](https://wikipedia.org/wiki/Statistical_classification)

The question we want to ask of this cuisine dataset is actually a **multiclass question**, as we have several potential national cuisines to work with. Given a batch of ingredients, which of these many classes will the data fit?

Scikit-Learn offers several different algorithms to use to classify data, depending on the kind of problem you want to solve. In the next two lessons, you'll learn about several of these algorithms.
Scikit-learn offers several different algorithms to use to classify data, depending on the kind of problem you want to solve. In the next two lessons, you'll learn about several of these algorithms.

## Clean and Balance Your Data

The first task at hand before starting this project is to clean and **balance** your data to get better results. Start with the blank `notebook.ipynb` file ini the root of this folder.

The first think to install is [imblearn](https://imbalanced-learn.org/stable/). This is a Scikit-Learn package that will allow you to better balance the data (you will learn more about this task in a minute).
The first think to install is [imblearn](https://imbalanced-learn.org/stable/). This is a Scikit-learn package that will allow you to better balance the data (you will learn more about this task in a minute).

```python
pip install imblearn
Expand Down
2 changes: 1 addition & 1 deletion 4-Classification/1-Introduction/assignment.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## Instructions

In [Scikit-Learn documentation](https://scikit-learn.org/stable/supervised_learning.html) you'll find a large list of ways to classify data. Do a little scavenger hunt in these docs: your goals is to look for classification methods and match a dataset in this curriculum, a question you can ask of it, and a technique of classification. Create a spreadsheet or table in a .doc file and explain how the dataset would work with the classification algorithm.
In [Scikit-learn documentation](https://scikit-learn.org/stable/supervised_learning.html) you'll find a large list of ways to classify data. Do a little scavenger hunt in these docs: your goals is to look for classification methods and match a dataset in this curriculum, a question you can ask of it, and a technique of classification. Create a spreadsheet or table in a .doc file and explain how the dataset would work with the classification algorithm.

## Rubric

Expand Down
2 changes: 1 addition & 1 deletion 4-Classification/1-Introduction/solution/notebook.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
},
{
"source": [
"Install Imblearn which will enable SMOTE. This is a Scikit-Learn package that helps handle imbalanced data when performing classification. (https://imbalanced-learn.org/stable/)"
"Install Imblearn which will enable SMOTE. This is a Scikit-learn package that helps handle imbalanced data when performing classification. (https://imbalanced-learn.org/stable/)"
],
"cell_type": "markdown",
"metadata": {}
Expand Down
Loading

0 comments on commit b7c3a8b

Please sign in to comment.