This is the fifth part in the Deep Learning for Beginners series. The other parts of the series are:
- Deep learning for beginners (Part 1): neurons & activation functions
- Deep learning for beginners (Part 2): some key terminology and concepts of neural networks
- Deep learning for beginners (Part 3): implementing our first Multi Layer Perceptron (MLP) model
- Deep learning for beginners (Part 4): inspecting our Multi Layer Perceptron (MLP) model
- Deep learning for beginners (Part 5): our first foray into Keras
- Deep learning for beginners (Part 6): more terminology to optimise our Keras model
- Deep learning for beginners (Part 7): neural network design (layers & neurons)
- Deep learning for beginners (Part 8): Improving our tuning script & using the Keras tuner
In the previous articles, we’ve looked at some of the terminology around neural networks and some simple implementation using the SkLearn library. This time, we’re going to look at Keras, which is a much more comprehensive & well-used neural network library.
The below, will look very familiar, if you’ve been following along with previous articles. Here, we are simply ingesting our data; splitting the features & target into different dataframes and doing a test/train split on our data,
import tensorflow as tf
import numpy as np
import pandas as pd
from tensorflow import keras
from tensorflow.keras import layers
from sklearn.model_selection import train_test_split
df = pd.read_csv('/home/Datasets/creditcard.csv')
output = df['Class']
features = df.drop('Class', 1)
train_features, test_features, train_labels, test_labels = train_test_split(df, output, test_size = 0.2, random_state = 42)
We start to diverge slightly is here. This is where we’re converting the Pandas dataframe that we usually work with to a tensor, which is compatible with the library. You have a choice – you can do all your data prep (encoding, scaling, etc…) in Pandas and then convert it to a tensor or you can use the Keras functionality to do it. I personally think keeping your life simpler by using a library you’re familiar with is sensible, so I’ll be sticking with Pandas.
train_features = tf.convert_to_tensor(train_features)
test_features = tf.convert_to_tensor(test_features)
train_labels = tf.convert_to_tensor(train_labels)
test_labels = tf.convert_to_tensor(test_labels)
Here is where we get so much more flexibility than the library we worked with previously. We can define each layer quite explicitly, which we couldn’t do with SKLearn.
In each of the lines below, we are adding a layer to the Keras model. The layer is defined as Dense, which means it’s fully connected (all the nodes connect to all the other nodes. We need to include: (#nodes in layer; activation function for layer) in each of the definitions (and of course in the first definition, we also define what the input shape of new data is expected to be.
Note: A sequetial model is one that has a sequential stack of layers in the neural network. The functional API allows you to branch layers or share layers, but that is not possible with the sequential model.
model = tf.keras.Sequential()
#INPUT LAYER (num nodes, function, num cols)
model.add(tf.keras.layers.Dense(31, activation = tf.nn.relu, input_shape=(31,)))
#HIDDEN LAYER
model.add(tf.keras.layers.Dense(10, activation = tf.nn.relu))
#OUTPUT LAYER
model.add(tf.keras.layers.Dense(1, activation = 'sigmoid' )) #returns probability of belonging to class
We then compile the model by defining the loss function (categorical_crossentropy), accuracy measure (accuracy) and optimizer to use (SGD (Stochastic Gradient Descent)). Machine Learning Mastery has an excellent post around the loss functions here. They define categorical_crossentropy as “Cross-entropy will calculate a score that summarizes the average difference between the actual and predicted probability distributions for predicting class 1. The score is minimized and a perfect cross-entropy value is 0.”
model.compile(loss = 'categorical_crossentropy',
optimizer = 'SGD',
metrics = ['accuracy'])
Now the fun bit! We can fit our model. Here we define the number of epochs and the batch size. A higher batch size leads to faster training but could be a poorer model that doesn’t genreralize well – it’s a bit of trial and error to get this parameter correct!
An epoch is made up of those batches. The number of epochs is the number of times the entire training dataset will pass through the model & back (forward & back propagation). Again, we should set this number to be quite high so that the error in the model can be minimized through the constant tuning of weights via back propagation.
epochs = 2
batch_size = 10000
model.fit(train_features, train_labels, epochs=epochs, batch_size=batch_size)
z = model.predict(test_features)
z = pd.DataFrame(z)
z.columns = ['pred']
pd.merge(df, z, left_index=True, right_index=True)
In the next article, I’ll talk a lot more about the different parameters & setting up our neural network optimally.