Deep Learning: Logistic Regression

6 min readJul 6, 2021

by: Manas Reddy

Over the past few decades, the digitization of our society has led to massive amounts of data being gathered and stored. Combining this increase in the scale of stored information with advances in hardware computational power and algorithmic innovations, the field of artificial intelligence (AI) has jumped into the spotlight as machines seem to possess the ‘magical’ ability to learn without being told explicitly what to do.

What makes this field so interesting is the fact that it can do tasks without us ever having to handle them, accounting for any errors or misinformation. The computer should be able to theoretically be able to “adapt ” and similarly produce the same output. Artificial Intelligence is also being used to discover new ways or methods to improve drastically or find new ways to solve already existing problems for example early cancer detection using early gene detection in babies.

So the question arises? How does the so-called “Artificial Intelligence” work?

Here is an example of a very simple “Neuron”. These as you know are the building blocks of the brain. Like-wise in AI, neurons also form the basic building blocks of anything AI. The neuron mainly does three things,

Takes input from data, here shown as x1, x2,x3
Does some sort of function on the data
and finally, displays the Output

Sounds, pretty straightforward. But much lies beneath the surface. An Ai can contain 100s if not 1000s of such neurons to give the suitable output.

LOGISTIC REGRESSION: First Principles Method

So as we understood, a neuron consists of inputs, and a function is performed on the input to give you an output. But to determine how important the input is to the output we add weights to the inputs as often in many cases some inputs have more importance than others. For example in creating an algorithm to find out how much a house should be priced for in a market. The square footage of the house is more important to the result than the type of roof tiles laid on the house. Thus weights are added, depicted as w1,w2,w3 etc. To offset a zero condition a bias is added “b”.

Here, represented in matrix form we map it as a [1,3] matrix as there is only one output and three inputs. This equation closely resembles a line equation,

The SIGMOID Function

This is how a sigmoid function looks, the sigmoid function has two major outputs either 0, or 1. Which makes it a perfect function for binary classification. The formula for a sigmoid function is

The activation function of Z gives a value closer to 0 or 1. As if Z is very small i.e negative it gives 1/ (1 + a value that is very big) which is equivalent to 1/ 0 = 0 and if the value is closer to one i.e very big then 1/1 = 1. That's why for binary classification the sigmoid function is preferred.

Logistic Regression

Now for the hero of the story, Logistic Regression. Logistic Regression actually works on the principles of Logistic Regression in Statistics. Imposter Syndrome who? Logistic Regression in Ai actually borrows the principles finetunes it to match its needs and gives the output. Logistic Regression is actually widely used in Machine Learning, it's used to find out simple yes/no, true/false, i.e Binary classification of data. For example, to determine if the picture shown is a cat or not, to see if in a covid patient's lung scan the patient suffers lung infection.

So from the equation above

a = w(transpose ) x + b

z = sigmoid function(a)

LOSS and COST Function

To effectively train our model, we must guide our model to realize what's desired and not desired so we add a loss function to each neuron and the goal of the machine is to minimize the loss to give the optimal output.

On observation, we deduce the loss function optimal for linear regression is

And the cost function is given as the average of the loss function among all the neurons, which is given as

Gradient Descent

Plotting the cost function J(w,b) gives us a convex function that gives us a graph that looks like this

The reason we use this is because of the fact that the function is a convex function with one global optima. Imagine rolling a ball around any corners the place with the least loss would be right in the centre. Similarily the global optima with the least point would be the centre. To reach the centre we must lower the value iteritavely till we reach the lowest point. The best way to lower the value would be by using a derivative. As the definition of a derivative is how much a value would change if incremented by an infinitesimally small value with respect to another value. So the formulas for weight (W) and Bias(b) is:

Where α is the learning rate or the “stride ” the ball takes to reach the bottom using the previous analogy.

To represent it graphically

The function moving regardless of position

Forward Propagation

In order to get an Output, the input data must be fed in the forward direction, so as to go through the function. Each hidden layer accepts the input data, processes it as per the activation function, and passes it to the successive layer.

Backward Propagation

In order to find the gradient, one must use Back Propagation, but now the question arises, what the heck is gradient, and why do we need to find it. The gradient is how different weights i.e, how w1,w2,w3 that we use at the beginning of the function to compute the costs, affect the cost overall. So for example we require a low cost per function as; a low cost per function indicates that the model is performing well and is accurate. So after one propagation, the output values are displayed.

These are the key functions of Logistic Regression

Deep Learning: Logistic Regression

LOGISTIC REGRESSION: First Principles Method

The SIGMOID Function

LOSS and COST Function

Gradient Descent

Written by Manasreddy

Responses (1)