This is the first of a series of hands on series where I'll explain deep learning step-by-step and with a lot of experimental results. Let's start from a classical but hard enough problem: recognizing hand written numbers.
How many times have you thought is that a 4 or a 9 when your best mate wrote a number on a piece of paper? Well, if that's hard for humans how possibly could it be simpler for a computer to learn. Welcome in the kingdom of deep learning where certain tasks can be taught to computer with super-humans capacity. And when I say "taught" I mean it. Here, we don't code algorithms for solving problems. No, here we code algorithms for learning how to solve a problem. Then, we take a bunch of examples and the computer will learn from them. Kinda of cool, no?
So let's start.
First, we need a dataset with handwritten characters and luckily we have one handy. That's MNIST (
http://yann.lecun.com/exdb/mnist/) which is produced by
Yan LeCun the guru of deep learning, currently at Facebook. He invented something known as ConvNets which broke any previous result in learning in so many different application domains. I think he will get the Turing Award one day. Convnets are simple and effective as we will see in follow up posting.
Second, we need some high level library for coding deep-learning in a simple and effective way. Here we are super-lucky because in the last year there has been a Cambrian explosion of deep learning libraries with all the big players giving a contribution from Google, to Facebook, to Microsoft, to the Academic world. After testing many (Theano, Google's Tensorflow, Lasagne, Block, Neon) I decided to go for
Keras because it is clean and minimalist. Plus it runs on the top of Theano and TensorFlow which are the state of the art today and you can switch the backend transparently. Keras supports both CPUs and GPUs computation.
Third, let's show directly some code which I wrote and can get to an accuracy of >98%
import numpy as np
import matplotlib.pyplot as plt
import time
np.random.seed(1111) # for reproducibility
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation, Flatten
from keras.layers.convolutional import Convolution2D, MaxPooling2D
from keras.utils import np_utils
from keras.regularizers import l2, activity_l2
from keras.utils.visualize_util import plot
from keras.optimizers import SGD, Adam, RMSprop
from keras.callbacks import EarlyStopping
import inspect
#
# save the graph produced by the experiment
#
def print_Graph(
# Training log
fitlog,
# elapsed time
elapsed,
# input parameters for the experiment
args,
# input values for the experiment
values):
experiment_label = "\n".join(['%s=%s' % (i, values[i]) for i in args])
experiment_file = experiment_label+"-Time= %02d" % elapsed + "sec"
experiment_file = experiment_file.replace("\n", "-")+'.png'
fig = plt.figure(figsize=(6, 3))
plt.plot(fitlog.history["val_acc"])
plt.title('val_accuracy')
plt.ylabel('val_accuracy')
plt.xlabel('iteration')
fig.text(.7,.15,experiment_label, size='6')
plt.savefig(experiment_file, format="png")
#
# A LeNet-like convnet for classifying MINST handwritten characters 28x28
#
def convNet_LeNet(
VERBOSE=1,
# normlize
NORMALIZE = True,
# Network Parameters
BATCH_SIZE = 128,
NUM_EPOCHS = 20,
# Number of convolutional filters
NUM_FILTERS = 32,
# side length of maxpooling square
NUM_POOL = 2,
# side length of convolution square
NUM_CONV = 3,
# dropout rate for regularization
DROPOUT_RATE = 0.5,
# hidden number of neurons first layer
NUM_HIDDEN = 128,
# validation data
VALIDATION_SPLIT=0.2, # 20%
# optimizer used
OPTIMIZER = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
):
# Output classes, number of MINST DIGITS
NUM_CLASSES = 10
# Shape of an MINST digit image
SHAPE_X, SHAPE_Y = 28, 28
# Channels on MINST
IMG_CHANNELS = 1
# LOAD the MINST DATA split in training and test data
(X_train, Y_train), (X_test, Y_test) = mnist.load_data()
X_train = X_train.reshape(X_train.shape[0], 1, SHAPE_X, SHAPE_Y)
X_test = X_test.reshape(X_test.shape[0], 1, SHAPE_X, SHAPE_Y)
# convert in float32 representation for GPU computation
X_train = X_train.astype("float32")
X_test = X_test.astype("float32")
if (NORMALIZE):
# NORMALIZE each pixerl by dividing by max_value=255
X_train /= 255
X_test /= 255
print('X_train shape:', X_train.shape)
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')
# KERAS needs to represent each output class into OHE representation
Y_train = np_utils.to_categorical(Y_train, NUM_CLASSES)
Y_test = np_utils.to_categorical(Y_test, NUM_CLASSES)
nn = Sequential()
#FIRST LAYER OF CONVNETS, POOLING, DROPOUT
# apply a NUM_CONV x NUM_CONF convolution with NUM_FILTERS output
# for the first layer it is also required to define the input shape
# activation function is rectified linear
nn.add(Convolution2D(NUM_FILTERS, NUM_CONV, NUM_CONV,
input_shape=(IMG_CHANNELS, SHAPE_X, SHAPE_Y) ))
nn.add(Activation('relu'))
nn.add(Convolution2D(NUM_FILTERS, NUM_CONV, NUM_CONV))
nn.add(Activation('relu'))
nn.add(MaxPooling2D(pool_size = (NUM_POOL, NUM_POOL)))
nn.add(Dropout(DROPOUT_RATE))
#SECOND LAYER OF CONVNETS, POOLING, DROPOUT
# apply a NUM_CONV x NUM_CONF convolution with NUM_FILTERS output
nn.add(Convolution2D( NUM_FILTERS, NUM_CONV, NUM_CONV))
nn.add(Activation('relu'))
nn.add(Convolution2D(NUM_FILTERS, NUM_CONV, NUM_CONV))
nn.add(Activation('relu'))
nn.add(MaxPooling2D(pool_size = (NUM_POOL, NUM_POOL) ))
nn.add(Dropout(DROPOUT_RATE))
# FLATTEN the shape for dense connections
nn.add(Flatten())
# FIRST HIDDEN LAYER OF DENSE NETWORK
nn.add(Dense(NUM_HIDDEN))
nn.add(Activation('relu'))
nn.add(Dropout(DROPOUT_RATE))
# OUTFUT LAYER with NUM_CLASSES OUTPUTS
# ACTIVATION IS SOFTMAX, REGULARIZATION IS L2
nn.add(Dense(NUM_CLASSES, W_regularizer=l2(0.01) ))
nn.add(Activation('softmax') )
#summary
nn.summary()
#plot the model
plot(nn)
# set an early-stopping value
early_stopping = EarlyStopping(monitor='val_loss', patience=2)
# COMPILE THE MODEL
# loss_function is categorical_crossentropy
# optimizer is parametric
nn.compile(loss='categorical_crossentropy',
optimizer=OPTIMIZER, metrics=["accuracy"])
start = time.time()
# FIT THE MODEL WITH VALIDATION DATA
fitlog = nn.fit(X_train, Y_train, \
batch_size=BATCH_SIZE, nb_epoch=NUM_EPOCHS, \
verbose=VERBOSE, validation_split=VALIDATION_SPLIT, \
callbacks=[early_stopping])
elapsed = time.time() - start
# Test the network
results = nn.evaluate(X_test, Y_test, verbose=VERBOSE)
print('accuracy:', results[1])
# just to get the list of input parameters and their value
frame = inspect.currentframe()
args, _, _, values = inspect.getargvalues(frame)
# used for printing pretty arguments
print_Graph(fitlog, elapsed, args, values)
return fitlog
# 2 epochs
#log = convNet_LeNet(OPTIMIZER = 'Adam', NUM_EPOCHS=2)
#print(log.history)
# 20 epochs
#log = convNet_LeNet(OPTIMIZER = 'Adam', NUM_EPOCHS=20)
#print(log.history)
# default optimizer = SGD
#log = convNet_LeNet(NUM_EPOCHS=20)
#print(log.history)
# default optimizer = RMSProp
#log = convNet_LeNet(OPTIMIZER=RMSprop(), NUM_EPOCHS=20)
#print(log.history)
## default optimizer
#log = convNet_LeNet(OPTIMIZER='Adam', DROPOUT_RATE=0)
#print(log.history)
# default optimizer
#log = convNet_LeNet(OPTIMIZER='Adam', DROPOUT_RATE=0.1)
#print(log.history)
# default optimizer
#log = convNet_LeNet(OPTIMIZER='Adam', DROPOUT_RATE=0.2)
#print(log.history)
# default optimizer
#log = convNet_LeNet(OPTIMIZER='Adam', DROPOUT_RATE=0.4)
#print(log.history)
# default optimizer
#log = convNet_LeNet(OPTIMIZER='Adam', BATCH_SIZE=64)
#print(log.history)
#log = convNet_LeNet(OPTIMIZER='Adam', BATCH_SIZE=128)
#print(log.history)
#log = convNet_LeNet(OPTIMIZER='Adam', BATCH_SIZE=256)
#print(log.history)
#log = convNet_LeNet(OPTIMIZER='Adam', BATCH_SIZE=512)
#print(log.history)
#log = convNet_LeNet(OPTIMIZER='Adam', BATCH_SIZE=1024)
#print(log.history)
#log = convNet_LeNet(OPTIMIZER='Adam', BATCH_SIZE=2048)
#print(log.history)
#
#log = convNet_LeNet(OPTIMIZER='Adam', BATCH_SIZE=4096)
#print(log.history)
#log = convNet_LeNet(OPTIMIZER='Adam', VALIDATION_SPLIT=0.8)
#print(log.history)
#log = convNet_LeNet(OPTIMIZER='Adam', VALIDATION_SPLIT=0.6)
#print(log.history)
#log = convNet_LeNet(OPTIMIZER='Adam', VALIDATION_SPLIT=0.4)
#print(log.history)
#log = convNet_LeNet(OPTIMIZER='Adam', VALIDATION_SPLIT=0.2)
#print(log.history)
#log = convNet_LeNet(OPTIMIZER='Adam', VALIDATION_SPLIT=0.2, NORMALIZE=False)
#print(log.history)
#log = convNet_LeNet(OPTIMIZER='Adam', VALIDATION_SPLIT=0.2, NUM_FILTERS=64)
#print(log.history)
log = convNet_LeNet(OPTIMIZER='Adam', NUM_FILTERS=128)
print(log.history)
# log = convNet_LeNet(OPTIMIZER='Adam', NUM_FILTERS=256)
# print(log.history)
# x log = convNet_LeNet(OPTIMIZER='Adam', NUM_POOL=4)
# x print(log.history)
# log = convNet_LeNet(OPTIMIZER='Adam', NUM_POOL=8)
# print(log.history)
#log = convNet_LeNet(OPTIMIZER='Adam', NUM_CONV=4)
#print(log.history)
# x log = convNet_LeNet(OPTIMIZER='Adam', NUM_CONV=8)
# x print(log.history)
#log = convNet_LeNet(OPTIMIZER='Adam', NUM_HIDDEN=32)
#print(log.history)
#log = convNet_LeNet(OPTIMIZER='Adam', NUM_HIDDEN=64)
#print(log.history)
#log = convNet_LeNet(OPTIMIZER='Adam', NUM_HIDDEN=256)
#print(log.history)
#log = convNet_LeNet(OPTIMIZER='Adam', NUM_HIDDEN=512)
#print(log.history)
#log = convNet_LeNet(OPTIMIZER='Adam', NUM_HIDDEN=1024)
#print(log.history)
# VERBOSE=1,
# # normlize
# NORMALIZE = True,
# # Network Parameters
# BATCH_SIZE = 128,
# NUM_EPOCHS = 100,
# # Number of convolutional filters
# NUM_FILTERS = 32,
# # side length of maxpooling square
# NUM_POOL = 2,
# # side length of convolution square
# NUM_CONV = 3,
# # dropout rate for regularization
# DROPOUT_RATE = 0.5,
# # hidden number of neurons first layer
# N_HIDDEN = 128,
# # validation data
# VALIDATION_SPLIT=0.2, # 20%
# # optimizer used
# OPTIMIZER = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
#plt.show()
Next posting is about describing the code. Then, you will see dozens of experiments for exploring the hyper-parameters' space and inferring some rules of thumbs for fine tuning our deep learning nets.
Stay tuned, during the next months we will see more than 20 nets for deep learning in different contexts and show super-human capacity