Deep learning for beginners (Part 2): some key terminology and concepts of neural networks

This is the second part in the Deep Learning for Beginners series. The other parts of the series are:

  1. Deep learning for beginners (Part 1): neurons & activation functions
  2. Deep learning for beginners (Part 2): some key terminology and concepts of neural networks
  3. Deep learning for beginners (Part 3): implementing our first Multi Layer Perceptron (MLP) model
  4. Deep learning for beginners (Part 4): inspecting our Multi Layer Perceptron (MLP) model
  5. Deep learning for beginners (Part 5): our first foray into Keras
  6. Deep learning for beginners (Part 6): more terminology to optimise our Keras model
  7. Deep learning for beginners (Part 7): neural network design (layers & neurons)
  8. Deep learning for beginners (Part 8): Improving our tuning script & using the Keras tuner

This is article two in the series of deep learning for beginners – article one can be found here. This article is going to cover some more of the fundamentals surrounding neural networks. Once we have the groundwork covered, subsequent articles in the series will get hands on with the code.

This article will be a little bit of a terminology dump – there are tonnes of things we need to understand about the workings of neural networks before we can hope to effectively implement them in our own work.

First of all, let’s talk about forward propagation and back propagation.

Forward propagation is the process of randomly assigning weights to the input layer & using them to ‘guess’ the correct output. That output is then checked for accuracy using a cost function. The output of that function will be adjusted input layer weights; which will help to improve the model accuracy – this is referred to as back propagation.

In the above diagram, you can see this in action. The model is randomly assigned weights to the input layer; a weighted sum of those inputs (called the net input function) is then taken & it’s passed into the activation function which takes f(a,b,c) to produce output Y (the prediction/guess). 

We then compare that prediction with reality to determine the error. We try to minimise that error by optimising the input layer weights.

Biases are applied to the hidden layers of the neural network. The purpose of the bias is to laterally shift the predicted function so that it overlaps the original.

As we can see above, the predicted function has broadly the right distribution – it’s just in the wrong part of the chart. The bias can be applied to move that predicted distribution to overlay the original (true) distribution.

Neural Networks are optimised using stochastic gradient descent. That is an algorithm that estimates the error gradient for the model – we’re looking for the lowest point of that slope (or gradient), which will be the smallest error rate.

Gradient descent is an iterative function. It moves bit by bit down the slope to reach the lowest point in steps. Imagine the below example, we’re using a step size of 2. That means, we’ll actually miss the lowest point of the slope. However, we must make a trade off, because using a very small step size can lead to excruciatingly long model training.

Now that we have adjusted weightings and bias, we’re ready to start tuning the model. The key hyper parameter that we have at our disposal is the learning rate. The learning rate determines the step size made. The smaller the step size, the closer you’re likely to get to the minimum error, but the higher your computational demands.

Now, let’s talk about a couple of important model types.


First off, we have the simplest neural network – a perceptron. This is simply comprised of some input data (with corresponding weights) and an activation function.

This type of model would be used for binary classification problems, where we have linearly separable classes. A perception with a single neuron cannot be applied to non-linear problems.

The neuron in the perceptron has an activation function which imposes a threshold (like ReLU or sigmoid – as we discussed here) – giving a binary response of 1 or 0.

Multi-Layer Perceptron

We then have a multi-layer peceptron which is much more complex and capable and it solves the problem of perceptrons being unable to handle non linear problems. It has dense fully connected layers of neurons and bi-directional propagation (forward propagation & back propagation).

With the multi-layer perceptron (MLP), data travels through several layers of neurons (or nodes). Each neuron can use any activation function (not just threshold like the perceptron).

Multi-layer perceptrons combine the inputs with their weightings to generate a weighted sum. That weighted sum then passes through an activation function. Each layer in the network passes the output of its computation to the next layer to be acted upon. The flow passes through each hidden layer of the neural network until it reaches the output layer.

Back propagation allows the model to effectively learn from its mistakes – it adjusts the weights of the input variables to minimize the error.