Skip to content

Commit

Permalink
updated lesson 5
Browse files Browse the repository at this point in the history
  • Loading branch information
racheltho committed Jun 7, 2017
1 parent 33d182f commit 1398435
Showing 1 changed file with 66 additions and 18 deletions.
84 changes: 66 additions & 18 deletions nbs/5. Health Outcomes with Linear Regression.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -961,28 +961,36 @@
},
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"heading_collapsed": true
},
"source": [
"## Regularization and noise"
]
},
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"hidden": true
},
"source": [
"Regularization is a way to reduce over-fitting and create models that better generalize to new data."
]
},
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"hidden": true
},
"source": [
"### Regularization"
]
},
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"hidden": true
},
"source": [
"Lasso regression uses an L1 penalty, which pushes towards sparse coefficients. The parameter $\\alpha$ is used to weight the penalty term. Scikit Learn's LassoCV performs cross validation with a number of different values for $\\alpha$.\n",
"\n",
Expand All @@ -993,7 +1001,8 @@
"cell_type": "code",
"execution_count": 119,
"metadata": {
"collapsed": true
"collapsed": true,
"hidden": true
},
"outputs": [],
"source": [
Expand All @@ -1003,7 +1012,9 @@
{
"cell_type": "code",
"execution_count": 120,
"metadata": {},
"metadata": {
"hidden": true
},
"outputs": [
{
"name": "stderr",
Expand Down Expand Up @@ -1034,7 +1045,9 @@
{
"cell_type": "code",
"execution_count": 121,
"metadata": {},
"metadata": {
"hidden": true
},
"outputs": [
{
"data": {
Expand All @@ -1054,7 +1067,9 @@
{
"cell_type": "code",
"execution_count": 122,
"metadata": {},
"metadata": {
"hidden": true
},
"outputs": [
{
"data": {
Expand All @@ -1073,14 +1088,18 @@
},
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"hidden": true
},
"source": [
"### Noise"
]
},
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"hidden": true
},
"source": [
"Now we will add some noise to the data"
]
Expand All @@ -1089,7 +1108,8 @@
"cell_type": "code",
"execution_count": 123,
"metadata": {
"collapsed": true
"collapsed": true,
"hidden": true
},
"outputs": [],
"source": [
Expand All @@ -1100,7 +1120,8 @@
"cell_type": "code",
"execution_count": 124,
"metadata": {
"collapsed": true
"collapsed": true,
"hidden": true
},
"outputs": [],
"source": [
Expand All @@ -1111,7 +1132,9 @@
{
"cell_type": "code",
"execution_count": 125,
"metadata": {},
"metadata": {
"hidden": true
},
"outputs": [
{
"data": {
Expand All @@ -1133,7 +1156,9 @@
{
"cell_type": "code",
"execution_count": 126,
"metadata": {},
"metadata": {
"hidden": true
},
"outputs": [
{
"data": {
Expand All @@ -1151,10 +1176,27 @@
"regr_metrics(y_test, regr.predict(test))"
]
},
{
"cell_type": "markdown",
"metadata": {
"hidden": true
},
"source": [
"Huber loss is a loss function that is less sensitive to outliers than squared error loss. It is quadratic for small error values, and linear for large values.\n",
"\n",
" $$L(x)= \n",
"\\begin{cases}\n",
" \\frac{1}{2}x^2, & \\text{for } \\lvert x\\rvert\\leq \\delta \\\\\n",
" \\delta(\\lvert x \\rvert - \\frac{1}{2}\\delta), & \\text{otherwise}\n",
"\\end{cases}$$"
]
},
{
"cell_type": "code",
"execution_count": 127,
"metadata": {},
"metadata": {
"hidden": true
},
"outputs": [
{
"data": {
Expand All @@ -1175,7 +1217,9 @@
},
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"hidden": true
},
"source": [
"How is sklearn doing this? By checking [the source code](https://github.com/scikit-learn/scikit-learn/blob/14031f6/sklearn/linear_model/base.py#L417), you can see that in the dense case, it calls [scipy.linalg.lstqr](https://github.com/scipy/scipy/blob/v0.19.0/scipy/linalg/basic.py#L892-L1058), which is calling a LAPACK method:\n",
"\n",
Expand All @@ -1191,14 +1235,18 @@
},
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"hidden": true
},
"source": [
"#### Scipy Sparse LSQR"
]
},
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"hidden": true
},
"source": [
"Uses [LSQR: An Algorithm for Sparse Linear Equations and Sparse Least Squares](https://web.stanford.edu/class/cme324/paige-saunders2.pdf) by C.C. Paige and M.A. Saunders (1982). Based on Golub and Kahan's bidiagonalization procedure.\n",
"\n",
Expand Down

0 comments on commit 1398435

Please sign in to comment.