In this article we will solve one of the easiest and most widely used example for deep learning. The cats Vs dogs classification. The goal is to train a model that will take a picture of a cat or a dog as input and tell you whether it is a cat or a god in output. Simple as that. This application can then be generalized to any recognition between two classes, if you have enough training data of course !
We will use a convolutional neural network, or “CNN”. CNNs work really good for detecting features in images like:
- colors
- textures
- edges
then using these features to identify objects in the images.
We will use the dataset available here : Kaggle cat vs dog
Here are my environment details :
After downloading the dataset, here is what we have in the cat folder:
And here is what we have in the dog folder :
We have around 25’000 examples, well balanced between cat and dog.
The first task we have to perform is load the data and create the training data that will be used by our deep learning neural network. Here we go :
Building training data
import numpy as np import os import cv2 import random import pickle #Directory where we have the images of the cat and dogs : DATADIR = "G:/Mon Drive/Deep_learning/DeeP Learning/PetImages" #Categories that we will use to classify the images : CATEGORIES = ["Dog", "Cat"] #The number of pixels that we want to use for width and height of the images: IMG_SIZE = 50 #Empty list training_data = [] #Function to create the training data def create_training_data(): for category in CATEGORIES: #categories that we have : Dog and Cat path = os.path.join(DATADIR,category) #for accessing the filesystem see the os module : we create path to dogs and cats with the datadir and the category name class_num=CATEGORIES.index(category) #Instead of "cat" and "dog" we want a number, 0 = Dog, 1 = Cat for img in os.listdir(path): #iterate over each image per dogs and cats #print("got one: "+str(class_num)+' '+str(i)) #If you want to have a counter, as this function is quite long try: img_array = cv2.imread(os.path.join(path,img) ,cv2.IMREAD_GRAYSCALE) #You convert the image to an array and you convert it to grayscale, no need for colors new_array = cv2.resize(img_array, (IMG_SIZE, IMG_SIZE)) #Resize the image with the IMG_SIZE that we defined, 50 pixels training_data.append([new_array, class_num]) #you append the image to the training data with the classification (0 or 1, dog or cat) except Exception as e: pass #You call the function create_training_data() #Print the number of records for good measure print(len(training_data)) #now we have all dogs and then all cats, therefor we shuffle the #data random.shuffle(training_data) #Sample and labels (lists) x=[] y=[] #we iterate trough the training data and we separate the input #(image data) from the output (classification) and append them to the lists for features, label in training_data: x.append(features) y.append(label) #We can not pass lists into a neural network #x has to be an numpy array #-1 means that we don't know the number of rows and that we #let numpy figure it out #the (-1, 50, 50 ,1) corresponds to the shape of the output array. #The output array is a 4 dimensional array with shape : #number of images - 50 pixels - 50 pixels - greyscale x=np.array(x).reshape(-1, IMG_SIZE,IMG_SIZE,1) #Now we are going to save the new data in a pickle, used to store #objects #Additionally, pickle stores type information when you store #objects, so when you will read it, it will be a numpy array #Writing to pickle : pickle_out = open("x.pickle","wb") pickle.dump(x,pickle_out) pickle_out.close() pickle_out = open("y.pickle","wb") pickle.dump(y,pickle_out) pickle_out.close() #Reading from pickle just to test it out pickle_in = open("x.pickle","rb") x=pickle.load(pickle_in)
I think the comments are enough to understand what is going on here. But basically what we do is :
- Open each folder with images
- Iterate over the images, convert them to array, resize them to 50*50 pixels so that we have the same format for each of them, take only greyscale (no colors) and append them to our list training_data with their classification
- shuffle the data (so that we do not have all the dogs and then all the cats..) – our dataset is well balanced, but if it was not we should balance it
- Separate feature / label, but keep the order (important!)
- Convert the list of features into an array and reshape it nicely with number of images / size of image / size of image / greyscale
- Save both to pickles
Example of what we do with the images :
import numpy as np import os import cv2 import random import pickle import matplotlib.pyplot as plt #Directory where we have the images of the cat and dogs : DATADIR = "G:/Mon Drive/Deep_learning/DeeP Learning/PetImages" #Categories that we will use to classify the images : CATEGORIES = ["Dog", "Cat"] #The number of pixels that we want to use for width and height of the images: IMG_SIZE = 50 #Empty list training_data = [] path = os.path.join(DATADIR,'Dog') #for accessing the filesystem see the os module : we create path to dogs and cats with the datadir and the category name for img in os.listdir(path): #iterate over each image per dogs and cats #print("got one: "+str(class_num)+' '+str(i)) #If you want to have a counter, as this function is quite long try: img_array = cv2.imread(os.path.join(path,img) ,cv2.IMREAD_GRAYSCALE) #You convert the image to an array and you convert it to grayscale, no need for colors plt.imshow(img_array, cmap='gray') plt.show() print(img_array.shape) new_array = cv2.resize(img_array, (IMG_SIZE, IMG_SIZE)) #Resize the image with the IMG_SIZE that we defined, 50 pixels plt.imshow(new_array, cmap='gray') plt.show() print(new_array.shape) break except Exception as e: pass
Original image :
After first operation (grey)
Shape is (375, 500)
After second operation (resize)
Shape is (50, 50)
Should have probably go with 100*100 as this is difficult to recognize !
Training the model
import tensorflow as tf from tensorflow.keras.datasets import cifar10 from tensorflow.keras.preprocessing.image import ImageDataGenerator from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense, Dropout, Activation, Flatten from tensorflow.keras.layers import Conv2D, MaxPooling2D from tensorflow.keras.callbacks import TensorBoard import numpy as np import pickle import time #For the purpose of saving the models NAME="Cats-vs-dog-cnn-64x3-{}".format(int(time.time())) #We create a tensorboard in order to monitor the learning of our neural network tensorboard=TensorBoard(log_dir='logs/{}'.format(NAME)) #Do not pay attention to this :) #gpu_options=tf.compat.v1.GPUOptions(per_process_gpu_memory_fraction=0.333) #sess=tf.compat.v1.Session(config=tf.compat.v1.ConfigProto(gpu_options=gpu_options)) #We are loading the training data previously saved : features pickle_in = open("X.pickle","rb") X = pickle.load(pickle_in) #We are loading the training data previously saved : labels pickle_in = open("y.pickle","rb") y = pickle.load(pickle_in) #Control print(len(X)) print(len(y)) #Normalizing the pixels (divide every cell by 255 as usual for #images) X=np.array(X/255.0) #we create an array out of labels y=np.array(y) #Control print(len(X)) print(len(y)) #We create the model #A Sequential model is appropriate for a plain stack of layers #where each layer has exactly one input tensor and one output #tensor. model = Sequential() #We add the input layer, the input which is expected is the #shape of each training example (each image), so : X.shape[1:] #Activation function is rectified linear unit #Pool size is the size of the window going over the input matrix model.add(Conv2D(64, (3, 3), input_shape=X.shape[1:])) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=(2, 2))) #We add 3 layers with 64 nodes model.add(Conv2D(64, (3, 3))) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Conv2D(64, (3, 3))) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Conv2D(64, (3, 3))) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Flatten()) #this converts our 3D feature maps to 1D vector #so basically 64 * 50 * 50 ==> 160'000 #final layer of a binary classification CNN #you either do Dense(2) and activation = 'softmax' #or you do Dense(1) and activation = 'sigmoid' model.add(Dense(1)) model.add(Activation('sigmoid')) #Once the model is created, you can config the model with losses #and metrics with model.compile() #Binary crossentropy is a loss function that is used in binary #classification tasks, such as this one #For instance, let's say you have 1050 training samples and you #want to set up a batch_size equal to 100. #The algorithm takes the first 100 samples (from 1st to 100th) #from the training dataset and trains the network. #Next, it takes the second 100 samples (from 101st to 200th) and #trains the network again. model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) #Finally we train the model with .fit passing in argument the #samples, the labels, #The batch size defines the number of samples that will be #propagated through the network. model.fit(X, y, batch_size=32, epochs=10, validation_split=0.3, callbacks=[tensorboard]) #We save the model model.save(NAME)
Again, I tried to put as much comments as possible in the code itself. but basically here is how it goes :
- We Load the data
- We normalize the data by dividing every pixel by 255 (normalizing ensures that every feature has a value which is in the same range roughly, so here between 0 and 1
- We create the model, this one has 3 layers with 64 nodes, the model is sequential and the activation function is relu for the layers and sigmoid for the output layer. Softmax would need two outputs, sigmoid only needs 1.
- We compile the model
- We train the model
- We save the model
Now you should have in your directory :
Let’s use the model now !
Using the model
import cv2 import tensorflow as tf import os #I added some new images to my test folder and just pass here the image DATADIR = "G:/Mon Drive/Deep_learning/DeeP Learning/Additional images/3.jpg" CATEGORIES = ["Dog", "Cat"] #will use this to convert prediction num to string value #We prepare the image so that it is in the same format as the images used to train the model def prepare(filepath): IMG_SIZE=50 #Number of pixels img_array=cv2.imread(filepath, cv2.IMREAD_GRAYSCALE) #read in the image, convert to grayscale new_array=cv2.resize(img_array, (IMG_SIZE,IMG_SIZE)) #resize image to match model's expected sizing return new_array.reshape(-1,IMG_SIZE,IMG_SIZE,1) #return the image with shaping that TF wants. #We load the model model = tf.keras.models.load_model('Cats-vs-dog-cnn-64x3-1601574654') #Always pass a list in predict prediction = model.predict([prepare(DATADIR)]) #REMEMBER YOU'RE PASSING A LIST OF THINGS YOU WISH TO PREDICT #Nicer way to display the label predicted print(CATEGORIES[int(prediction[0][0])]) #if prediction[0][0] < 0.0001: # print('it is a dog!') #else: # print('it is a cat!')
As you can see, using the model is quite easy. I added a few random test pictures in a folder :
And then tested the model on each of them. Sadly my model predicted that cat.jpg was a dog. But was correct for the rest and for an additional bunch of other pictures I fed through so I guess it works “OK”.
We load the image, we prepare the image, we then load the model, and we use .predict to use the model on the image and predict the classification..
And that’s it for the dog vs cat classifier !
Some additional theory :
The model learns by applying a small window that is sliding over the full image and tries to learn the specific patterns as per our filter matches.
With each convolutional layer, the model learns small details of the image such as lines, curves, object edges first. And as it traverse deeper into layers, the model learns more complex figures and parts of images.
- Convolutional: A convolutional layer is a rectangular grid of neurons. It requires that the previous layer also be a rectangular grid of neurons. Each neuron takes inputs from a rectangular section of the previous layer; the weights for this rectangular section are the same for each neuron in the convolutional layer
- Max-Pooling: After each convolutional layer, there may be a pooling layer. The pooling layer takes small rectangular blocks from the convolutional layer and subsamples it to produce a single output from that block. There are several ways to do this pooling, such as taking the average or the maximum, or a learned linear combination of the neurons in the block.
- Fully-Connected: Finally, after several convolutional and max pooling layers, the high-level reasoning in the neural network is done via fully connected layers. A fully connected layer takes all neurons in the previous layer (be it fully connected, pooling, or convolutional) and connects it to every single neuron it has.