In this article we will solve one of the easiest and most widely used example for deep learning. The cats Vs dogs classification. The goal is to train a model that will take a picture of a cat or a dog as input and tell you whether it is a cat or a god in output. Simple as that. This application can then be generalized to any recognition between two classes, if you have enough training data of course !

We will use a convolutional neural network, or “CNN”. CNNs work really good for detecting features in images like:

  • colors
  • textures
  • edges

then using these features to identify objects in the images.

We will use the dataset available here : Kaggle cat vs dog

Here are my environment details :

After downloading the dataset, here is what we have in the cat folder:

And here is what we have in the dog folder :

We have around 25’000 examples, well balanced between cat and dog.

The first task we have to perform is load the data and create the training data that will be used by our deep learning neural network. Here we go :

Building training data

import numpy as np
import os
import cv2
import random
import pickle


#Directory where we have the images of the cat and dogs : 
DATADIR = "G:/Mon Drive/Deep_learning/DeeP Learning/PetImages"
#Categories that we will use to classify the images :
CATEGORIES = ["Dog", "Cat"]

#The number of pixels that we want to use for width and height of the images: 
IMG_SIZE = 50
#Empty list
training_data = []

#Function to create the training data
def create_training_data():
	for category in CATEGORIES:  #categories that we have : Dog and Cat
		path = os.path.join(DATADIR,category)  #for accessing the filesystem see the os module : we create path to dogs and cats with the datadir and the category name 
		class_num=CATEGORIES.index(category) #Instead of "cat" and "dog" we want a number, 0 = Dog, 1 = Cat
		for img in os.listdir(path):  #iterate over each image per dogs and cats
			#print("got one: "+str(class_num)+' '+str(i)) #If you want to have a counter, as this function is quite long
			try: 
				img_array = cv2.imread(os.path.join(path,img) ,cv2.IMREAD_GRAYSCALE)  #You convert the image to an array and you convert it to grayscale, no need for colors
				new_array = cv2.resize(img_array, (IMG_SIZE, IMG_SIZE)) #Resize the image with the IMG_SIZE that we defined, 50 pixels
				training_data.append([new_array, class_num]) #you append the image to the training data with the classification (0 or 1, dog or cat)
			except Exception as e:
				pass

#You call the function
create_training_data()
#Print the number of records for good measure
print(len(training_data))

#now we have all dogs and then all cats, therefor we shuffle the 
#data
random.shuffle(training_data)

#Sample and labels (lists)
x=[]
y=[]

#we iterate trough the training data and we separate the input 
#(image data) from the output (classification) and append them to the lists
for features, label in training_data:
	x.append(features)
	y.append(label)

 
#We can not pass lists into a neural network
#x has to be an numpy array

 
#-1 means that we don't know the number of rows and that we 
#let numpy figure it out
#the (-1, 50, 50 ,1) corresponds to the shape of the output array. 
#The output array is a 4 dimensional array with shape : 
#number of images - 50 pixels - 50 pixels - greyscale  
x=np.array(x).reshape(-1, IMG_SIZE,IMG_SIZE,1)

#Now we are going to save the new data in a pickle, used to store 
#objects
#Additionally, pickle stores type information when you store 
#objects, so when you will read it, it will be a numpy array

#Writing to pickle : 
pickle_out = open("x.pickle","wb")
pickle.dump(x,pickle_out)
pickle_out.close()

pickle_out = open("y.pickle","wb")
pickle.dump(y,pickle_out)
pickle_out.close()

#Reading from pickle just to test it out
pickle_in = open("x.pickle","rb")
x=pickle.load(pickle_in)

I think the comments are enough to understand what is going on here. But basically what we do is :

  1. Open each folder with images
  2. Iterate over the images, convert them to array, resize them to 50*50 pixels so that we have the same format for each of them, take only greyscale (no colors) and append them to our list training_data with their classification
  3. shuffle the data (so that we do not have all the dogs and then all the cats..) – our dataset is well balanced, but if it was not we should balance it
  4. Separate feature / label, but keep the order (important!)
  5. Convert the list of features into an array and reshape it nicely with number of images / size of image / size of image / greyscale
  6. Save both to pickles

Example of what we do with the images :

import numpy as np
import os
import cv2
import random
import pickle
import matplotlib.pyplot as plt


#Directory where we have the images of the cat and dogs : 
DATADIR = "G:/Mon Drive/Deep_learning/DeeP Learning/PetImages"
#Categories that we will use to classify the images :
CATEGORIES = ["Dog", "Cat"]

#The number of pixels that we want to use for width and height of the images: 
IMG_SIZE = 50
#Empty list
training_data = []



path = os.path.join(DATADIR,'Dog')  #for accessing the filesystem see the os module : we create path to dogs and cats with the datadir and the category name 
for img in os.listdir(path):  #iterate over each image per dogs and cats
	#print("got one: "+str(class_num)+' '+str(i)) #If you want to have a counter, as this function is quite long
	try: 
		img_array = cv2.imread(os.path.join(path,img) ,cv2.IMREAD_GRAYSCALE)  #You convert the image to an array and you convert it to grayscale, no need for colors
		plt.imshow(img_array, cmap='gray')
		plt.show()
		print(img_array.shape)
		new_array = cv2.resize(img_array, (IMG_SIZE, IMG_SIZE)) #Resize the image with the IMG_SIZE that we defined, 50 pixels
		plt.imshow(new_array, cmap='gray')
		plt.show()
		print(new_array.shape)
		break
	except Exception as e:
		pass

Original image :

After first operation (grey)

Shape is (375, 500)

After second operation (resize)

Shape is (50, 50)

Should have probably go with 100*100 as this is difficult to recognize !

Training the model

import tensorflow as tf
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation, Flatten
from tensorflow.keras.layers import Conv2D, MaxPooling2D
from tensorflow.keras.callbacks import TensorBoard
import numpy as np
import pickle
import time

#For the purpose of saving the models
NAME="Cats-vs-dog-cnn-64x3-{}".format(int(time.time()))

#We create a tensorboard in order to monitor the learning of our neural network
tensorboard=TensorBoard(log_dir='logs/{}'.format(NAME))

#Do not pay attention to this :) 
#gpu_options=tf.compat.v1.GPUOptions(per_process_gpu_memory_fraction=0.333)
#sess=tf.compat.v1.Session(config=tf.compat.v1.ConfigProto(gpu_options=gpu_options))

#We are loading the training data previously saved : features
pickle_in = open("X.pickle","rb")
X = pickle.load(pickle_in)


#We are loading the training data previously saved : labels
pickle_in = open("y.pickle","rb")
y = pickle.load(pickle_in)

#Control
print(len(X))
print(len(y))

#Normalizing the pixels (divide every cell by 255 as usual for 
#images)
X=np.array(X/255.0)
#we create an array out of labels
y=np.array(y)

#Control
print(len(X))
print(len(y))


#We create the model
#A Sequential model is appropriate for a plain stack of layers
#where each layer has exactly one input tensor and one output 
#tensor.
model = Sequential()


#We add the input layer, the input which is expected is the 
#shape of each training example (each image), so : X.shape[1:]
#Activation function is rectified linear unit
#Pool size is the size of the window going over the input matrix
model.add(Conv2D(64, (3, 3), input_shape=X.shape[1:]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

#We add  3 layers with 64 nodes
model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Flatten())  #this converts our 3D feature maps to 1D vector
#so basically 64 * 50 * 50 ==> 160'000 


#final layer of a binary classification CNN
#you either do Dense(2) and activation = 'softmax'
#or you do Dense(1) and activation = 'sigmoid'
model.add(Dense(1))
model.add(Activation('sigmoid'))


#Once the model is created, you can config the model with losses 
#and metrics with model.compile()
#Binary crossentropy is a loss function that is used in binary 
#classification tasks, such as this one
#For instance, let's say you have 1050 training samples and you 
#want to set up a batch_size equal to 100. 
#The algorithm takes the first 100 samples (from 1st to 100th) 
#from the training dataset and trains the network.
#Next, it takes the second 100 samples (from 101st to 200th) and 
#trains the network again.
model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

#Finally we train the model with  .fit passing in argument the 
#samples, the labels, 
#The batch size defines the number of samples that will be
#propagated through the network.
model.fit(X, y, batch_size=32, epochs=10, validation_split=0.3, callbacks=[tensorboard])

#We save the model
model.save(NAME)

Again, I tried to put as much comments as possible in the code itself. but basically here is how it goes :

  1. We Load the data
  2. We normalize the data by dividing every pixel by 255 (normalizing ensures that every feature has a value which is in the same range roughly, so here between 0 and 1
  3. We create the model, this one has 3 layers with 64 nodes, the model is sequential and the activation function is relu for the layers and sigmoid for the output layer. Softmax would need two outputs, sigmoid only needs 1.
  4. We compile the model
  5. We train the model
  6. We save the model

Now you should have in your directory :

Let’s use the model now !

Using the model

import cv2
import tensorflow as tf 
import os

#I added some new images to my test folder and just pass here the image
DATADIR = "G:/Mon Drive/Deep_learning/DeeP Learning/Additional images/3.jpg"

CATEGORIES = ["Dog", "Cat"] #will use this to convert prediction num to string value


#We prepare the image so that it is in the same format as the images used to train the model
def prepare(filepath):
  IMG_SIZE=50 #Number of pixels
  img_array=cv2.imread(filepath, cv2.IMREAD_GRAYSCALE) #read in the image, convert to grayscale
  new_array=cv2.resize(img_array, (IMG_SIZE,IMG_SIZE)) #resize image to match model's expected sizing
  return new_array.reshape(-1,IMG_SIZE,IMG_SIZE,1) #return the image with shaping that TF wants.

#We load the model 
model = tf.keras.models.load_model('Cats-vs-dog-cnn-64x3-1601574654')

#Always pass a list in predict
prediction = model.predict([prepare(DATADIR)]) #REMEMBER YOU'RE PASSING A LIST OF THINGS YOU WISH TO PREDICT

#Nicer way to display the label predicted
print(CATEGORIES[int(prediction[0][0])])

#if prediction[0][0] < 0.0001:
#  print('it is a dog!')
#else:
#   print('it is a cat!') 

As you can see, using the model is quite easy. I added a few random test pictures in a folder :

And then tested the model on each of them. Sadly my model predicted that cat.jpg was a dog. But was correct for the rest and for an additional bunch of other pictures I fed through so I guess it works “OK”.

We load the image, we prepare the image, we then load the model, and we use .predict to use the model on the image and predict the classification..

And that’s it for the dog vs cat classifier !

Some additional theory :

The model learns by applying a small window that is sliding over the full image and tries to learn the specific patterns as per our filter matches.

With each convolutional layer, the model learns small details of the image such as lines, curves, object edges first. And as it traverse deeper into layers, the model learns more complex figures and parts of images.

CNN: Convolutional Neural Network – mc.ai
  • Convolutional: A convolutional layer is a rectangular grid of neurons. It requires that the previous layer also be a rectangular grid of neurons. Each neuron takes inputs from a rectangular section of the previous layer; the weights for this rectangular section are the same for each neuron in the convolutional layer
  • Max-Pooling: After each convolutional layer, there may be a pooling layer. The pooling layer takes small rectangular blocks from the convolutional layer and subsamples it to produce a single output from that block. There are several ways to do this pooling, such as taking the average or the maximum, or a learned linear combination of the neurons in the block.
  • Fully-Connected: Finally, after several convolutional and max pooling layers, the high-level reasoning in the neural network is done via fully connected layers. A fully connected layer takes all neurons in the previous layer (be it fully connected, pooling, or convolutional) and connects it to every single neuron it has.