Here are my environment details :

So.. The MNIST dataset is usually one of the first one people are using to learn about neural networks or machine learning as a whole. I remember solving this problem in matlab in the course taught by Andrew NG on Coursera. More fun and more easy to do it python for sure !

So what is the principle of this exercise ? Basically we get a list of images, each image is a 28*28 Pixels number. The goal of the exercise is to train a model to take this data as input and then classify it to a label. “This pixels forming this image are an image of the number 5 for instance.”

Image Classification in 10 Minutes with MNIST Dataset – mc.ai

Each number is an array of pixels :

Maybe more clear if we single out one number :

We can clearly see the pixels, with a different tonality of gray in each. This is exactly what we will pass to the model, for each pixel in this image, what is the gray scale. And out of this data the model should be able to tell us what number is it.

Let’s jump right in !

import tensorflow.keras as keras
import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np 

print(tf.__version__)

#Downloading the dataet containing the numbers (standard keras)
mnist=tf.keras.datasets.mnist



#Unpacking the dataset into train and test data
(x_train,y_train),(x_test,y_test)=mnist.load_data()

#x_train = are the pixels values
#y_train = are the actuals labels (it is a 4, a 3, a 5..)

#x_test and y_test are validation data

#A few commands to picture what we are manipulating
print(x_train[0])
plt.imshow(x_train[0],cmap=plt.cm.binary)
plt.show()
print(y_train[0])

#Normalize/Scaling the data, we are using the built in normalize function, we could also just divide by 255 to get 
#a value between 0 and 1 for each pixel
x_train=tf.keras.utils.normalize(x_train, axis=1)
x_test=tf.keras.utils.normalize(x_test, axis=1)

#BUILDING THE MODEL
model = tf.keras.models.Sequential()
#Input Layer : Now, we'll pop in layers. Recall our neural network image? Was the input layer flat, or was it multi-dimensional? It was flat. 
#So, we need to take this 28x28 image, and make it a flat 1x784. There are many ways for us to do this, but keras has a Flatten layer built just for us, so we'll use that.
model.add(tf.keras.layers.Flatten())
#This will serve as our input layer. It's going to take the data we throw at it, and just flatten it for us. 

#Next, we want our hidden layers.
#We're going to go with the simplest neural network layer, which is just a Dense layer. This refers to the fact that it's a densely-connected layer, 
#meaning it's "fully connected," where each node connects to each prior and subsequent node. Just like our image.

#This layer has 128 units. The activation function is relu, short for rectified linear. Currently, 
#relu is the activation function you should just default to. There are many more to test for sure, but, if you don't know what to use, use relu to start.
model.add(tf.keras.layers.Dense(128, activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(128, activation=tf.nn.relu)) 

#This is our final layer. It has 10 nodes. 1 node per possible number prediction. In this case, our activation function is a softmax function, 
#since we're really actually looking for something more like a probability distribution of which of the possible prediction options this thing we're passing features through of is. Great, our model is done.
model.add(tf.keras.layers.Dense(10, activation=tf.nn.softmax)) 

#Parameters for the model
#Remember why we picked relu as an activation function? Same thing is true for the Adam optimizer. It's just a great default to start with.
#Next, we have our loss metric. Loss is a calculation of error. A neural network doesn't actually attempt to maximize accuracy. 
#It attempts to minimize loss. Again, there are many choices, but some form of categorical crossentropy is a good start for a classification task like this.
model.compile(optimizer="adam", loss="sparse_categorical_crossentropy",metrics=['accuracy'] )

#Train the model
model.fit(x_train,y_train, epochs=3)

#Evaluate
val_loss, val_acc = model.evaluate(x_test, y_test)
print(val_loss, val_acc)

#
#In case you want to Save model
model.save('your_first_model.model')

#In case you want to load model
new_model = tf.keras.models.load_model('your_first_model.model')

#Check predictions
predictions=new_model.predict([x_test])
print(predictions)

#Check prediction for first sample
print(np.argmax(predictions[0]))
#Actually show the first sample
plt.imshow(x_test[0])
plt.show()

#It is a seven! :) 

Our model achieves 97% of accuracy, which is pretty good :

And correctly identifies a random number we throw at it :

I included most of the explanation in the code itself, so feel free to check it out. Basically we :

  1. load the data and we separate it into train and test data
  2. We normalize the features (all between 0 and 1)
  3. We build the model as a Sequential model with 1 input layer that will flatten the data that is thrown at it (from 28*28 to 1*784) then two hidden layers of 128 nodes with relu as an activation function and we end with an output layer with 10 nodes (as many as classification possibilities, 10 in our exampels : 1,2,3,4,5,6,7,8,9,0) with a softmax. Softmax will basically calculate probabilities of the different output classes based on the input that we pass to the model.
  4. We compile the model with Adam and a loss function
  5. We train the model with the training data
  6. We evaluate the model and then we save it
  7. We load it, because why not
  8. We test it with our x_test data set