Deep learning for beginners (Part 7): neural network design (layers & neurons)

This is the seventh part in the Deep Learning for Beginners series. The other parts of the series are:

  1. Deep learning for beginners (Part 1): neurons & activation functions
  2. Deep learning for beginners (Part 2): some key terminology and concepts of neural networks
  3. Deep learning for beginners (Part 3): implementing our first Multi Layer Perceptron (MLP) model
  4. Deep learning for beginners (Part 4): inspecting our Multi Layer Perceptron (MLP) model
  5. Deep learning for beginners (Part 5): our first foray into Keras
  6. Deep learning for beginners (Part 6): more terminology to optimise our Keras model
  7. Deep learning for beginners (Part 7): neural network design (layers & neurons)
  8. Deep learning for beginners (Part 8): Improving our tuning script & using the Keras tuner

In this article, we’re going to talk about how we design our neural network. We now know (from the previous articles), how neural networks function & some of the terminology around them. But given your knowledge, you’re likely asking ‘how do we know the optimal values for: epochs, batch size; optimizer; loss function and node count?’. Well, as with many aspects of data science and machine learning, it’s down to trial and error.

Below, I have included two scripts; a random search and a full search. The random search will run through a bunch of different parameter values, returning a dataframe (like the below), which shows the accuracy level & configuration used on each. The full search will do a comprehensive search for every parameter – much more intensive.

Random Search

import tensorflow as tf
import numpy as np
import pandas as pd
from tensorflow import keras
from tensorflow.keras import layers
from sklearn.model_selection import train_test_split
import random

df = pd.read_csv('/home/Datasets/creditcard.csv')

output = df['Class']
features = df.drop('Class', 1)

train_features, test_features, train_labels, test_labels = train_test_split(df, output, test_size = 0.2, random_state = 42)

train_features = tf.convert_to_tensor(train_features)
test_features = tf.convert_to_tensor(test_features)
train_labels = tf.convert_to_tensor(train_labels)
test_labels = tf.convert_to_tensor(test_labels)

out = []

num_nodes = [1, 5, 10, 20, 25]
act_functions = [tf.nn.relu]
optimizers = ['SGD']
loss_functions = ['categorical_crossentropy']
epochs_count = ['10', '50', '100', '200', '500']
batch_sizes = ['500', '1000', '2000']

rounds = 1

while rounds <=3:
    model = tf.keras.Sequential()
    act = random.choice(act_functions)
    opt = random.choice(optimizers)
    ep = random.choice(epochs_count)
    batch = random.choice(batch_sizes)
    loss = random.choice(loss_functions) 
    count = random.choice(num_nodes)
    
    model.add(tf.keras.layers.Dense(31, activation = act, input_shape=(31,)))  
    model.add(tf.keras.layers.Dense(count, activation = act)) 
    model.add(tf.keras.layers.Dense(1, activation = 'sigmoid')) #sigmoid for binary response - returns probability of belonging to class
    model.compile(loss = loss,
             optimizer = opt,
             metrics = ['accuracy'])

    epochs = int(ep)
    batch_size = int(batch)
    model.fit(train_features, train_labels, epochs=epochs, batch_size=batch_size)
    acc = model.history.history['accuracy']
    loss = model.history.history['loss']

    out.append([count, act, acc, loss, ep, opt, batch])
    rounds = rounds + 1
                        
columns = ['num_nodes', 'activation_func', 'accuracy_per_epoch', 'loss', 'epochs', 'opt', 'batch']
df = pd.DataFrame(out)
df.columns = columns

def split_epochs(row):
    epochs = row['accuracy_per_epoch']
    list_of_floats = [float(item) for item in epochs]
    return  max(epochs)

df['max_epoch_acc'] = df.apply(split_epochs, axis=1)

Full Search

import tensorflow as tf
import numpy as np
import pandas as pd
from tensorflow import keras
from tensorflow.keras import layers
from sklearn.model_selection import train_test_split

df = pd.read_csv('/home/Datasets/creditcard.csv')

output = df['Class']
features = df.drop('Class', 1)

train_features, test_features, train_labels, test_labels = train_test_split(df, output, test_size = 0.2, random_state = 42)

train_features = tf.convert_to_tensor(train_features)
test_features = tf.convert_to_tensor(test_features)
train_labels = tf.convert_to_tensor(train_labels)
test_labels = tf.convert_to_tensor(test_labels)

out = []

num_nodes = [1, 5, 10, 20, 25]
act_functions = [tf.nn.relu]
optimizers = ['SGD']
loss_functions = ['categorical_crossentropy']
epochs_count = ['10', '50', '100', '200', '500']
batch_sizes = ['500', '1000', '2000']

for count in num_nodes:
    for act in act_functions:
        for opt in optimizers:
            for ep in epochs:
                for batch in batch_sizes:
                    for loss in loss_functions:  
                        model = tf.keras.Sequential()
                        model.add(tf.keras.layers.Dense(31, activation = act, input_shape=(31,))) 
                        model.add(tf.keras.layers.Dense(count, activation = act)) 
                        model.add(tf.keras.layers.Dense(1, activation =  'sigmoid')) #sigmoid for binary response - returns probability of belonging to class
                        model.compile(loss = loss,
                                 optimizer = opt,
                                 metrics = ['accuracy'])

                        epochs = int(ep)
                        batch_size = int(batch)
                        model.fit(train_features, train_labels, epochs=epochs, batch_size=batch_size)
                        acc = model.history.history['accuracy']
                        loss = model.history.history['loss']

                        out.append([count, act, acc, loss, ep, opt, batch])
                        
columns = ['num_nodes', 'activation_func', 'accuracy_per_epoch', 'loss', 'epochs', 'opt', 'batch']
df = pd.DataFrame(out)
df.columns = columns

def split_epochs(row):
    epochs = row['accuracy_per_epoch']
    list_of_floats = [float(item) for item in epochs]
    return  max(epochs)

df['max_epoch_acc'] = df.apply(split_epochs, axis=1)

These scripts only look at the parameters to set where we already know the number of layers we want in our network. So the question is, how many layers do we want? Well, we can just keep adding layers until the accuracy doesn’t improve anymore! Remember that too many layers can lead to overfitting. The simpler you can make your neural network design, the better – it will be easier to maintain and less computationally expensive.

What about the neuron count? Well we can say that:

  • Too few neurons can cause underfitting
  • Too many neurons can cause overfitting

The optimal number of neurons will be somewhere between the size of the input layer and the size of the output layer.

Kodey