LeNet and MNIST handwritten digit recognition

Khuyen Le
4 min readMar 8, 2021

--

LeNet (or LeNet-5) is a convolutional neural network structure proposed by Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner in 1989. The first purpose of this network is to recognize handwritten digits in images. It was successfully applied for identifying handwritten zip code numbers provided by the US Postal Service [1].

Source

In this article, we are going to discover the architecture of this network as well as its application in MNIST handwritten digit images.

I. Architecture

LeNet consists of 2 parts:

  • The first part includes two convolutional layers and two pooling layers which are placed alternatively.
  • The second part consists of three fully connected layers.

The architecture of LeNet is described by the following figure:

LeNet architecture. Source

In the figure above, Cx, Sx, Fx are corresponding to the convolutional layer, sub-sampling layer (a.k.a pooling layer), and fully connected layer, respectively, where x denotes the layer index.

  • The input is images of size 28 × 28
  • C1 is the first convolutional layer with 6 convolution kernels of size 5× 5.
  • S2 is the pooling layer that outputs 6 channels of 14 × 14 images. The pooling window size, in this case, is a square matrix of size 2 × 2.
  • C3 is a convolutional layer with 16 convolution kernels of size 5 × 5. Hence, the output of this layer is 16 feature images of size 10 × 10.
  • S4 is a pooling layer with a pooling window of size 2 × 2. Hence, the dimension of images through this layer is halved, it outputs 16 feature images of size 5 × 5.
  • C5 is the convolutional layer with 120 convolution kernels of size 5 × 5. Since the inputs of this layer have the same size as the kernel, then the output size of this layer is 1 × 1. The number of channels in output equals the channel number of kernels, which is 120. Hence the output of this layer is 120 feature images of size 1 × 1.
  • F6 is a fully connected layer with 84 neurons which are all connected to the output of C5.
  • The output layer consists of 10 neurons corresponding to the number of classes (numbers from 0 to 9).

II. Application of LeNet for recognizing MNIST data

In this section, we apply LeNet for recognizing MNIST handwritten digit images. This network is constructed in Keras platform:

1. Loading MNIST dataset

Visualizing randomly some images in the training set:

png

2. Preprocessing data

This task includes the following steps:

  • Reshape images into the required size of Keras
  • Convert integer values into float values
  • Normalize data
  • One-hot encoding labels

3. Build LeNet model

LeNet_model = LeNet()
LeNet_model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 28, 28, 6) 156
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 14, 14, 6) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 10, 10, 16) 2416
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 5, 5, 16) 0
_________________________________________________________________
flatten (Flatten) (None, 400) 0
_________________________________________________________________
dense (Dense) (None, 120) 48120
_________________________________________________________________
dense_1 (Dense) (None, 10) 1210
=================================================================
Total params: 51,902
Trainable params: 51,902
Non-trainable params: 0
_________________________________________________________________

4. Training model

train_model(LeNet_model, X_train, y_train, X_test, y_test)Epoch 1/50
468/468 [==============================] - 35s 5ms/step - loss: 1.5654 - accuracy: 0.5390 - val_loss: 36.5179 - val_accuracy: 0.9097
Epoch 2/50
468/468 [==============================] - 2s 4ms/step - loss: 0.3126 - accuracy: 0.9072 - val_loss: 26.5710 - val_accuracy: 0.9378
...................
...................
...................
Epoch 49/50
468/468 [==============================] - 2s 4ms/step - loss: 0.0249 - accuracy: 0.9927 - val_loss: 6.1983 - val_accuracy: 0.9875
Epoch 50/50
468/468 [==============================] - 2s 4ms/step - loss: 0.0262 - accuracy: 0.9922 - val_loss: 6.0475 - val_accuracy: 0.9869
313/313 [==============================] - 1s 2ms/step - loss: 6.0378 - accuracy: 0.9869
png
Accuracy on training and testing set

5. Prediction

Determine the confusion matrix:

png

Visualize randomly some images in the test set as well as their predicted labels:

png

III. Conclusion

We have discovered the architecture of the LeNet model and how to implement it in Keras. This model is successfully applied for classifying MNIST handwritten digit images with 98,69% of performance. In the next articles, we are going to discover some modern convolutional models as well as their applications in more complicated problems.

I hope this article is helpful for you.

Thanks for reading!

Github code for this article: https://github.com/KhuyenLE-maths/LeNet_model_with_MNIST_recognition/blob/main/LeNet_with_MNIST_recognition.ipynb

My blog page: https://lekhuyen.medium.com/

________________________________________________________________

Reference:

[1] Le Cun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., & Jackel, L. D. (1989, January). Handwritten digit recognition with a back-propagation network. In Proceedings of the 2nd International Conference on Neural Information Processing Systems (pp. 396–404).

--

--

Khuyen Le

Postdoctoral Researcher at 3IA Côte d'Azur - Interdisciplinary Institute for Artificial Intelligence