This is the first part in the Deep Learning for Beginners series. The other parts of the series are:

- Deep learning for beginners (Part 1): neurons & activation functions
- Deep learning for beginners (Part 2): some key terminology and concepts of neural networks
- Deep learning for beginners (Part 3): implementing our first Multi Layer Perceptron (MLP) model
- Deep learning for beginners (Part 4): inspecting our Multi Layer Perceptron (MLP) model
- Deep learning for beginners (Part 5): our first foray into Keras
- Deep learning for beginners (Part 6): more terminology to optimise our Keras model
- Deep learning for beginners (Part 7): neural network design (layers & neurons)
- Deep learning for beginners (Part 8): Improving our tuning script & using the Keras tuner

The term deep learning has always been of interest to me. It’s a sub-set of machine learning which exclusively deals with neural networks. So, why isn’t it called simply neural network learning? Why isn’t neural network considered a type of machine learning model (like a K-Nearest Neighbour or Random Forest)? Why is it it’s own discipline?

My philosophical questions aside, let’s get into our introduction to deep learning.

As I mentioned, deep learning is fundamentally all about neural networks. A neural network is designed to mimic the way that the human brain operates; allowing them to identify complex relationships in huge datasets.

An example neural network may look something like the below. We’ll talk through each of these component parts.

As you can see, the neural network contains a bunch of interconnected blocks. Those blocks are called nodes (or perceptrons – which are artificial neurons (from Biology)). The first layer of nodes in a neural network is called an input layer. After that, we will find one (or more) hidden layers, followed by an output layer.

Each node takes an input – which can be either from the raw datasource itself or from a node in the previous layer of the network. Once the data (or signal) is received by the node, it performs a calculation; the output of which is passed to other nodes, deeper in the neural network.

The pathway between nodes (or neurons) is called a synapse. As you can see in the below, the red neuron has more than one synapses, from different preceeding nodes.

Each of these synapses has a weight assigned to it. The weight is intended to differentiate the importance of each of the preceeding nodes (i.e. to make the input of one node carry more weight (or importance) than another). These weights are tuned during model training.

The Neuron receives an input from a node in the previous layer; it adds them all up (multiplied by its weight) and passes them to what we call an activation function.

In the example above, you can see the weights applied to the incoming values. They then get passed to the activation function which will do some computation on the data and pass it to the next neuron, deeper in the neural network.

To summarise this part, before we move on:

- A neural network is designed to mimic the way humans think to provide the capability to find complex patterns in data
- Neural networks are made up of nodes (also called neurons or perceptrons)
- Those nodes are connected via synapses
- Synapses send a signal to the node with data & weights. The neuron receives the signal and takes the weighted sum of the signals
- The weighted sum is then passed into an activiation function, which produces an output for the next neuron in the neural network.

Now, let’s talk about activation functions in some detail. The simplest type of activation function (in my opinion) is the** threshold function (also referred to as TLU (Threshold Logic Unit)).** This simply assesses whether a value lies above or below a threshold. The output from this activation function will be binary (1 or 0).

Next, we have the **sigmoid function** – which is just like we see in a logistic regression problem – we use it for classification & the output will again be binary (1 or 0).

The **rectifier (ReLU) function** is a very simple function too. It’s a simple piece of logic – if the input data is <0 then it will return 0 else it will return the input value. Mathematically, it looks something like MAX(0, x) where x is the input value.

The **Hyperbolic Tangent Function (tanh)** simply takes the real value of the input and forces it between the range of minus 1 to plus 1.

We can use the below neural network to start thinking about how it all works. Here, we have the simplest possible neural network – it has an input layer & an output layer, with no hidden layers. This model will simply take the weighted sum of the input variables (just like a linear regression) and produce an output.

The output will simply be *((Age*Weight)+(Years Worked* Weight)+(Past Salary * Weight)+(Years in industry * weight))*

As I mentioned, this model does not have any hidden layers. The hidden layers are super important – they’re the ones that have activation functions & they’re the bits that make neural networks so powerful.