In this lab you'll once again build a neural network, but this time you will be using Keras to do a lot of the heavy lifting.
You will be able to:
- Build a neural network using Keras
- Evaluate performance of a neural network using Keras
We'll start by importing all of the required packages and classes.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import random
from sklearn.model_selection import train_test_split
from keras.utils.np_utils import to_categorical
from sklearn import preprocessing
from keras.preprocessing.text import Tokenizer
from keras import models
from keras import layers
from keras import optimizers
In this lab you will be classifying bank complaints available in the 'Bank_complaints.csv'
file.
# Import data
df = None
# Inspect data
print(df.info())
df.head()
As mentioned earlier, your task is to categorize banking complaints into various predefined categories. Preview what these categories are and what percent of the complaints each accounts for.
# Your code here
Before we build our neural network, we need to do several preprocessing steps. First, we will create word vector counts (a bag of words type representation) of our complaints text. Next, we will change the category labels to integers. Finally, we will perform our usual train-test split before building and training our neural network using Keras. With that, let's start munging our data!
Our first step again is to transform our textual data into a numerical representation. As we saw in some of our previous lessons on NLP, there are many ways to do this. Here, we'll use the Tokenizer()
class from the preprocessing.text
sub-module of the Keras package.
As with our previous work using NLTK, this will transform our text complaints into word vectors. (Note that the method of creating a vector is different from our previous work with NLTK; as you'll see, word order will be preserved as opposed to a bag of words representation). In the below code, we'll only keep the 2,000 most common words and use one-hot encoding.
# As a quick preliminary, briefly review the docstring for keras.preprocessing.text.Tokenizer
Tokenizer?
# ⏰ This cell may take about thirty seconds to run
# Raw text complaints
complaints = df['Consumer complaint narrative']
# Initialize a tokenizer
tokenizer = Tokenizer(num_words=2000)
# Fit it to the complaints
tokenizer.fit_on_texts(complaints)
# Generate sequences
sequences = tokenizer.texts_to_sequences(complaints)
print('sequences type:', type(sequences))
# Similar to sequences, but returns a numpy array
one_hot_results= tokenizer.texts_to_matrix(complaints, mode='binary')
print('one_hot_results type:', type(one_hot_results))
# Useful if we wish to decode (more explanation below)
word_index = tokenizer.word_index
# Tokens are the number of unique words across the corpus
print('Found %s unique tokens.' % len(word_index))
# Our coded data
print('Dimensions of our coded results:', np.shape(one_hot_results))
As a note, you can also decode these vectorized representations of the reviews. The word_index
variable, defined above, stores the mapping from the label number to the actual word. Somewhat tediously, we can turn this dictionary inside out and map it back to our word vectors, giving us roughly the original complaint back. (As you'll see, the text won't be identical as we limited ourselves to top 2000 words.)
While a bit tangential to our main topic of interest, we need to reverse our current dictionary word_index
which maps words from our corpus to integers. In decoding our one_hot_results
, we will need to create a dictionary of these integers to the original words. Below, take the word_index
dictionary object and change the orientation so that the values are keys and the keys values. In other words, you are transforming something of the form {A:1, B:2, C:3} to {1:A, 2:B, 3:C}.
# Your code here
reverse_index = None
comment_idx_to_preview = 19
print('Original complaint text:')
print(complaints[comment_idx_to_preview])
print('\n\n')
# The reverse_index cell block above must be complete in order for this cell block to successively execute
decoded_review = ' '.join([reverse_index.get(i) for i in sequences[comment_idx_to_preview]])
print('Decoded review from Tokenizer:')
print(decoded_review)
On to step two of our preprocessing: converting our descriptive categories into integers.
product = df['Product']
# Initialize
le = preprocessing.LabelEncoder()
le.fit(product)
print('Original class labels:')
print(list(le.classes_))
print('\n')
product_cat = le.transform(product)
# If you wish to retrieve the original descriptive labels post production
# list(le.inverse_transform([0, 1, 3, 3, 0, 6, 4]))
print('New product labels:')
print(product_cat)
print('\n')
# Each row will be all zeros except for the category for that observation
print('One hot labels; 7 binary columns, one for each of the categories.')
product_onehot = to_categorical(product_cat)
print(product_onehot)
print('\n')
print('One hot labels shape:')
print(np.shape(product_onehot))
Now for our final preprocessing step: the usual train-test split.
random.seed(123)
test_index = random.sample(range(1,10000), 1500)
test = one_hot_results[test_index]
train = np.delete(one_hot_results, test_index, 0)
label_test = product_onehot[test_index]
label_train = np.delete(product_onehot, test_index, 0)
print('Test label shape:', np.shape(label_test))
print('Train label shape:', np.shape(label_train))
print('Test shape:', np.shape(test))
print('Train shape:', np.shape(train))
Let's build a fully connected (Dense) layer network with relu activation in Keras. You can do this using: Dense(16, activation='relu')
.
In this example, use two hidden layers with 50 units in the first layer and 25 in the second, both with a 'relu'
activation function. Because we are dealing with a multiclass problem (classifying the complaints into 7 categories), we use a use a 'softmax'
classifier in order to output 7 class probabilities per case.
# Initialize a sequential model
model = None
# Two layers with relu activation
# One layer with softmax activation
Now, compile the model! This time, use 'categorical_crossentropy'
as the loss function and stochastic gradient descent, 'SGD'
as the optimizer. As in the previous lesson, include the accuracy as a metric.
# Compile the model
In the compiler, you'll be passing the optimizer (SGD = stochastic gradient descent), loss function, and metrics. Train the model for 120 epochs in mini-batches of 256 samples.
Note: ⏰ Your code may take about one to two minutes to run.
# Train the model
history = None
Recall that the dictionary history
has two entries: the loss and the accuracy achieved using the training set.
history_dict = history.history
history_dict.keys()
As you might expect, we'll use our matplotlib
for graphing. Use the data stored in the history_dict
above to plot the loss vs epochs and the accuracy vs epochs.
# Plot the loss vs the number of epoch
# Plot the training accuracy vs the number of epochs
It seems like we could just keep on going and accuracy would go up!
Finally, it's time to make predictions. Use the relevant method discussed in the previous lesson to output (probability) predictions for the test set.
# Output (probability) predictions for the test set
y_hat_test = None
Finally, print the loss and accuracy for both the train and test sets of the final trained model.
# Print the loss and accuracy for the training set
results_train = None
results_train
# Print the loss and accuracy for the test set
results_test = None
results_test
We can see that the training set results are really good, and the test set results seem to be even better. In general, this type of result will be rare, as train set results are usually at least a bit better than test set results.
Congratulations! In this lab, you built a neural network thanks to the tools provided by Keras! In upcoming lessons and labs we'll continue to investigate further ideas regarding how to tune and refine these models for increased accuracy and performance.