weighted_least_squares.Rmd

---
jupyter:
  orphan: true
  jupytext:
    notebook_metadata_filter: all,-language_info
    split_at_heading: true
    text_representation:
      extension: .Rmd
      format_name: rmarkdown
      format_version: '1.2'
      jupytext_version: 1.16.1
  kernelspec:
    display_name: Python 3 (ipykernel)
    language: python
    name: python3
---

# Fitting models with different cost functions

```{python}
import numpy as np
np.set_printoptions(suppress=True)
from scipy.optimize import minimize
import pandas as pd
pd.set_option('mode.copy_on_write', True)
import statsmodels.formula.api as smf
import sklearn.linear_model as sklm
import sklearn.metrics as skmetrics
```

```{python}
df = pd.read_csv('data/rate_my_course.csv')
#MB To make it easier to run Statsmodels, in particular.
df = df.rename(columns={'Overall Quality': 'Quality'})
df
```

Fetch some columns of interest:

```{python}
# This will be our y (the variable we predict).
quality = df['Quality']
# One of both of these will be our X (the predictors).
clarity_easiness = df[['Clarity', 'Easiness']]
```

Fit the model with Statsmodels.

```{python}
```

Fit the same model with Scikit-learn.

```{python}
```

```{python}
```

Compare the parameters to Statsmodels.

```{python}
```

## The fitted values and Scikit-learn

The values predicted by the (Sklearn) model:

```{python}
```

Compare to Statsmodels:

```{python}
```

Write a function to compute the fitted values given the parameters:


```{python}
```

Compile Sklearn parameters, and use these to calculated fitted values with our function.  Compare, to show they are (near as dammit) similar.

```{python}
```

```{python}
```

## The long and the short of R^2^

Sklearn has an `r2_score` metric.

```{python}
```

We already know the formula for R^2^.  We can calculate by hand to show this gives the same answer.

```{python}
```

```{python}
```

```{python}
```

### R^2^ for another, reduced model

Fit a reduced model that is just "Clarity" without "Easiness".

```{python}
```

Calculate R^2^ for reduced model as compared to our full model.

```{python}
```

```{python}
```

## On weights, and a weighted mean

This is the usual mean:

```{python}
```

Of course this is the same as:

```{python}
```

or

```{python}
```

Calculate weights to weight values by number of professors, on the basis that larger number of professors may give more reliable values.

```{python}
```

Calculate weighted mean.

```{python}
```

Numpy's version of same:

```{python}
```

## Fitting the model with minimize

A function for Sum of Squares, to use with `minimize`.

```{python}
```

Use `sos` function, in example call, and then with `minimize`.

```{python}
```

```{python}
```

```{python}
```

Compare to parameters with (e.g.) Sklearn.

```{python}
```

## Fitting with weights

Statsmodels, weighted regression.

```{python}
```

[Sklearn, weighted
regression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html).
Also see [Wikipedia on weighted
regression](https://en.wikipedia.org/wiki/Weighted_least_squares).

```{python}
```

The `minimize` cost function for weighted regression:

```{python}
```

```{python}
```

```{python}
```

```{python}
```

## Penalized regression

Penalized regression is where you simultaneously minimize some cost related to the model (mis-)fit, and some cost related to the parameters of your model.

### Ridge regression

For example, in [ridge
regression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html),
with try and minimize the sum of squared residuals _and_ the sum of squares
of the parameters (apart from the intercept).

```{python}
```

Fit with the `minimize` cost function:

```{python}
```

```{python}
```

### LASSO


See the [Scikit-learn LASSO page](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html).

As noted there, the cost function is:

$$
\frac{1}{2 * \text{n_samples}} * ||y - Xw||^2_2 + alpha * ||w||_1
$$

$w$ refers to the vector of model parameters.

This part of the equation:

$$
||y - Xw||^2_2
$$

is the sum of squares of the residuals, because the residuals are $y - Xw$ (where $w$ are the parameters of the model), and the $||y - Xw||^2_2$ refers to the squared [L2 vector norm](https://mathworld.wolfram.com/L2-Norm.html), which is the same as the sum of squares.

$$
||w||_1
$$

is the L1 vector norm, which is the same as the sum of the absolute values of
the parameters.

Let's do that, with a low `alpha` (otherwise both slopes get forced down to zero):

```{python}
```

```{python}
```

```{python}
```

```{python}
```

## Cross-validation

Should I add the "Easiness" regressor?

```{python}
def drop_and_predict(df, x_cols, y_col, to_drop):
    out_row = df.loc[to_drop:to_drop]  # Row to drop, as a data frame
    out_df = df.drop(index=to_drop)  # Dataframe without dropped row.
    # Fit on everything but the dropped row.
    fit = sklm.LinearRegression().fit(out_df[x_cols], out_df[y_col])
    # Use fit to predict the dropped row.
    fitted = fit.predict(out_row[x_cols])
    return fitted[0]
```

Fit the larger model, with "Easiness", and drop / predict each "Quality" value.

```{python}
```

Calculate the sum of squared error:

```{python}
```

Fit the smaller model, omitting "Easiness", and drop / predict each "Quality"
value.

```{python}
```

How is the sum of squared error for this reduced model?

```{python}
```