### Basics of Machine Learning Series

### Introduction

The intuition and implementation of logistic regression is implemented in Classifiction and Logistic Regression and Logistic Regression Model

Similar to the linear regression, even logistic regression is prone to overfitting if there are large number of features. If the decision boundary is overfit, the shape might be highly contorted to fit only the training data while failing to generalise for the unseen data.

So, the cost function of the logistic regression is updated to penalize high values of the parameters and is given by,

- Where
- \({\lambda \over 2m } \sum_{j=1}^n \theta_j^2\) is the
**regularization term** - \(\lambda\) is the
**regularization factor**

- \({\lambda \over 2m } \sum_{j=1}^n \theta_j^2\) is the

```
import numpy as np
mul = np.matmul
"""
X is the design matrix
y is the target vector
theta is the parameter vector
lamda is the regularization parameter
"""
def sigmoid(X):
return np.power(1 + np.exp(-X), -1)
"""
hypothesis function
"""
def h(X, theta):
return sigmoid(mul(X, theta))
"""
regularized cost function
"""
def j(theta, X, y, lamda=None):
m = X.shape[0]
theta[0] = 0
if lamda:
return (-(1/m) * (mul(y.T, np.log(h(X, theta))) + \
mul((1-y).T, np.log(1 - h(X, theta)))) + \
(lamda/(2*m))*mul(theta.T, theta))[0][0]
return -(1/m) * (mul(y.T, np.log(h(X, theta))) + \
mul((1-y).T, np.log(1 - h(X, theta))))[0][0]
```

### Regularization for Gradient Descent

Previously, the **gradient descent for logistic regression without regularization** was given by,

- Where \(j \in \{0, 1, \cdots, n\} \)

But since the equation for cost function has changed in (1) to include the regularization term, there will be a **change in the derivative of cost function** that was plugged in the gradient descent algorithm,

Because the first term of cost fuction remains the same, so does the first term of the derivative. So taking **derivative of second term** gives \(\frac {\lambda} {m} \theta_j\) as seen above.

So, (2) can be updated as,

- Where \(j \in \{1, 2, \cdots, n\} \) and h is the
**sigmoid function**

It can be noticed that, **for case j=0, there is no regularization term** included which is consistent with the convention followed for
regularization.

```
"""
regularized cost gradient
"""
def j_prime(theta, X, y, lamda=None):
m = X.shape[0]
theta[0] = 0
if lamda:
return (1/m) * mul(X.T, (h(X, theta) - y)) + (lamda/m) * theta
return (1/m) * mul(X.T, (h(X, theta) - y))
"""
Simultaneous update
"""
def update_theta(theta, X, y, lamda=None):
return theta - alpha * j_prime(theta, X, y, lamda)
```

Link to Rough Working Code. Change the value of lamda in the code to get different decision boundaries for the data as shown below.