Fundamentals of Deep Learning: First Principles

10 min readAug 28, 2021

By: Manas Reddy

Never in a million years, did I think I’d be searching up Machine Learning Jokes, yet here I am. Plus the accuracy of this joke is unparalleled. Anyways, “Deep Learning” is probably a buzzword you’ve been hearing, all the time, here I attempt to explain it, in a hopefully better way so you’ll have a “Deep” intuition next time you’re asked about it.

“Disclaimer: Excuse the lame-ass puns, they’re a habit ”

You ever talk to your friends, that one particularly “woke” friend that always talks about world movements and ongoing current affairs. When all you ever wanted was some peace of mind and a place to look at your phone someplace other than your bed. Anyways, you decide to look it up, and you see a handful of topics and, you click on the ones that seem to provide decent information.

You read about how going vegan saves lives and how meat-eaters belong in hell. “No offense to the state of California” As you scroll past, you see ads along the sides of your articles, and they're surprisingly similar to the exact pair of shoes you searched for an hour ago. You ask yourself,

“Can my phone read my mind??”

Yes, it can and it knows about all the disgusting habits you have, like how you brush your teeth after you shower.

“That should definitely be illegal”

Relax, it actually can’t. Companies like Google, Facebook, etc. use machine learning and Deep Learning to provide better results for your searches, it takes the data you searched for, searches for a similar webpage containing the data, and presents it to you.

But, the actual use cases of Machine Learning, Deep Learning, Artificial Intelligence vary vastly. Assuming you’ve had some prior experience with coding and a lil bit of python and know a little bit about Deep Learning. I wish to help you understand what actually goes on behind the scenes while doing Machine Learning.

Luckily in this day and age, information is readily available and easy to find, similarly, developers have also made using Deep Learning quite simple to use with already available Python libraries such as scipy and scikit learn . But if you wish to implement a fundamentally different problem such as, a vector that can traverse a given gradient avoiding obstacles along the way, it becomes a different problem altogether. “Learnt this the hard way”. Thus to fully understand the working of a deep learning model, we could perhaps build one from the ground up using the first-principles method.

Now assuming the basic principles of python and object-oriented programming are known, it becomes quite a simple problem and not a mundane task.

1.1 THE PERCEPTRON

Sounds like a transformer, Yes the primary fundamental basic building block of a neural network sounds like Megatron’s twin sister. Deep Learning actually mimics the exact same process through which the neurons in our brian process information, electro-chemical information from various nerves in our body relay information like heat, pain and motion and the nerves process and relay the information. In a similar fashion the Perceptron has various inputs pipe-lined through the perceptron where a mathematical function is applied and gives an output.

This is an actual representation of a “neuron” in a deep learning model.

The neuron mainly does three things,

Takes input from data, here shown as x1, x2,x3
Does some sort of function on the data
and finally, displays the Output

The “neuron” takes the input, addes the weights associated with the inputs addes a bias “b” which is changeable but specific to the neuron and computes an output.

Now, if it was this simple, Deep Learning would be a piece of cake but thisis just the surface. These “neurons” are interconnected and form whats called a Neural Net. These Neural Nets can have 100s or even 1000s of neurons interconnected all doing the same exact thing to get the outcome

**Here each “O” represents one Perceptron**

As you can see its mind-bogglingly complex “really hope that’s a word” .

But when we break it down it becomes pretty easy to understand. You’ll obsereve the role of Weights, Biases, Activation functions etc. If this takes you back to back to the same feeling when your High- school Math Teacher started a new topic you’re in the right place. Don’t worry its not that hard. “Thats what she said”. Haha, cracks me up every time

1.2 Understanding the Perceptron

We’ll actually be coding here, I’ll be using Python 3.7 in Jupyter Notebooks any object -oriented programming should be perfect. I’m pretty well versed in python so thats my language of choice.

Let’s consider a single “ neuron” here we have 3 inputs namely 1.2, 5.1 , 6.1, weights 2.3, 5.6, 9.3 and a bias 3 , these are just random numbers you’re free to play around and choose any number you like. As we know a neuron takes the inputs, multiplies the weight and adds a bias giving an output. This the exact representation of what a neuron does at a fundamental level. Hope this self-explanatory.

1.3 Coding a Layer

Since we’ve seen that the neural net is actaully a Network of Neurons. Pretty obvious at this point. The Neural net can actually be divided into 3 layers, the Input Layer, the Hidden layers, and the Output Layer. The input and output are kinda self- explanatory they input the values and display the output values respectively.

Lets try to visualize this in code, here we’ll be trying to code a layer,

We’ll be assuming that we’d be coding a neuron from “Hidden Layer 1”, so we’d require inputs, 3 weights, and simultaneously 3 biases, and the same mathematical theorm of weights*inputs +bias would be applied. “Pretty Simple right”

This should pretty straight-forward to understand, here we have 3 inputs , weights and biases all represented in a list. In order to make it more dynamic, we added loops but the concept remain the same. Each weight list is multiplied with the input and added the corresponding bias. This is the fundamental most basic representation of a layer in a neural network. In reality, however most inputs and weights are all represented through matrices. But, lucky for us, Numpy is a very useful library to help with the multiplication and representation of matrices. So in order to reduce the amount of redundant code, we could easily represent weights, biases, and inputs in the form of matrices and use “Dot” product to multiply them.

1.4 Shape of an Array

If you’ve ever used Tensorflow or ever tried Deep Learning before the most common error is

“AttributeError: incompatible shape for a non-contiguous array”

This is, in my opinion more annoying than people who walk slowly in front of you. “Sorry Amy, but you’re taking up the whole side-walk”

So to actaully completely understand Deep Learning, you should be quite familiar with shapes and sizes of a vector.

Here’s a lil cheat sheet, first figure out the dimension of the array thats who many inputs ie., (x,) — 1D, (x,y)- 2D , (x,y,z)- 3D and so on, then observe the number of elements in the rows and columns, and input the number of rows then columns from left to right pretty simple right.

1.5 Dot Product

Dot Product is basically the multiplication of a row and column of a vector and adding the sum to given the output.

in the numpy library it is represented as numpy.dot(a, b, out=None)

So eliminating redundant code

for the second neuron with 3 weight set and biases.

Becomes pretty easy to represent

1.5 Batches, Layers, Objects

Okay so we’ve successfully coded and represented ideally on e neuron in a Network. Now, the question arises how do we represent a group of neurons, that are infact inter-connected. Additionally, now we deal with the problem of multiple inputs, with multiple weights and multiple biases, “I swear its not as complex as it seems”

Now, lets actually take a step back in sense zoom out of the problem that we are required to face. We have input neurons, providing inuput to another set of hidden layer neurons that multiply the weights and adds the biases, and that output is the input again to another layer of hidden neurons and again the output of that layer is input to another layer of neurons, and so on till we reach the output. We could group one set of inputs and outputs , as one “”Batch” which is exactly what we would be doing, essentially breaking down one layer with a batch of inputs from the previous layer.

Another way to look at this, is essentially you wish to climb a rocky mountain, with humungous rocks in your path to the top, now you have an app on your phone that shows you where these rocks are as you approach them , one map shows only exactly one rock as your come within 5 feet of the rock , versus another map that shows you ten rocks as you approach the rock. Which would be better?

You’d obviously go for the ten rocks because, it would be easier to know which route to take that best avoids those rocks, because you know the terrain more, i.e more information about what to expect.

Similarly, more inputs you provide a neuron the better the outcome would be , as the “terrain” would be visualized more improving the efficiency of the deep learning network. Now there are several caveats that would be explained further along as we “Deep” dive into this topic. “Last time I promise”

So, now since we’ve discovered the wonders of batch’s and their sizes lets convert the inputs into a “LOL” (List of List) representing the Inputs.

Looks pretty simple right, made a “LOL” with the inputs and now dot product

But theres a catch “Nothing’s ever this simple”

On running this piece of code , you’ll observe theres a shape error

“congrats on your first, there’s many more to come I guarantee it ” :)

But it worked before, why not now, if you go back into the fundamentals of Dot product you know that the row is multiplied with the column of the two vectors. Observe the inputs have “3” rows, while the weights have “4” columns, so the three elements in the column would be multiplied but that last element in the “4” columns wouldn’t be multiplied because theres no element. Thus throwing an error.

“The number of rows of the first input of the dot product must be equal to the number of columns of the second input of the dot product”

The shape of the weights array is 3x4 and the shape of the inputs matrix is also 3x4 , thus to make it compatible the weights matrix must be transposed that is the columns and rows must be reveresed, this can easily be done using numpy.(array).T. Applying this to the function we have

Now, we’re gonna add another layer to do that all we’re gonna do is add another set of weights and biases and make the output of the fisrt layer become the input of the second layer.

Now, we’ve successfully represented two layers of neurons. Pretty- straightforward right.

1.6 Converting the layers into Objects

Uptil now we’ve seen,the representation of neurons and the basic working of two interconnected neurons. To reduce redundancy and actually make the code more dynamic, we can introduce the concepts of Object-Oriented Programming . We can convert individual neurons into objects and assign them values, and basically it gets easier to see multiple neurons work.

Typically in Machine Learning Nomenclature, the inputs or the training data is represented as ‘X’. So first we’d start by renaming it

Now for the weights and biases, the weights we previously experimented with cannot actually be used in actual training models as if the weights are set too high, multiplied by the input and added by the bias and that again is multiplied and so on, eventually the output gets bigger and bigger and eventually leads to a data explosion. ”Imagine explaining that to your boss”

So in order to reduce the chances of that happening we generally use, small weights by the order of 0.1 or less. With this in mind lets create a class with weights, bias objects and a forward pass function.

Under the Layer_Dense class, we’ve defined two objects, Weights and Biases. The weights are a randomly generated array of size of the number of inputs and number of of neurons multiplied by 0.1 to keep the values of the numbers low. The biases are an array of zeros, consisting of a row vector of the number of neurons. This ensures theres no size dispute between weights and biases . ONe more thing to note is the fact that previously, dot product was defined as input and weights transpose, but if observed when initializing the weights we switched the representation. Previously we represented the weights as a function of n_neurons x n_inputs. Since we have full control during initialization. Using the transpose becomes once again redundant. Now to visualize the output

This is a complete Neural Network visualized in core python. But it doesn’t end here we now have to be able to optimize this function, calculate the cost of this function and further be able to tweak the weights and biases to obtain an accurate representation.

The follow- up article is posted here https://manasreddy11.medium.com/fundamentals-of-deep-learning-first-principles-part-ii-cbcb0dbe0104

Fundamentals of Deep Learning: First Principles

Written by Manasreddy

No responses yet