LeNet (or LeNet-5) is a convolutional neural network structure proposed by Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner in 1989. The first purpose of this network is to recognize handwritten digits in images. It was successfully applied for identifying handwritten zip code numbers provided by the US Postal Service [1].
In this article, we are going to discover the architecture of this network as well as its application in MNIST handwritten digit images.
I. Architecture
LeNet consists of 2 parts:
- The first part includes two convolutional layers and two pooling layers which are placed alternatively.
- The second part consists of three fully connected layers.
The architecture of LeNet is described by the following figure:
In the figure above, Cx, Sx, Fx are corresponding to the convolutional layer, sub-sampling layer (a.k.a pooling layer), and fully connected layer, respectively, where x denotes the layer index.
- The input is images of size 28 × 28
- C1 is the first convolutional layer with 6 convolution kernels of size 5× 5.
- S2 is the pooling layer that outputs 6 channels of 14 × 14 images. The pooling window size, in this case, is a square matrix of size 2 × 2.
- C3 is a convolutional layer with 16 convolution kernels of size 5 × 5. Hence, the output of this layer is 16 feature images of size 10 × 10.
- S4 is a pooling layer with a pooling window of size 2 × 2. Hence, the dimension of images through this layer is halved, it outputs 16 feature images of size 5 × 5.
- C5 is the convolutional layer with 120 convolution kernels of size 5 × 5. Since the inputs of this layer have the same size as the kernel, then the output size of this layer is 1 × 1. The number of channels in output equals the channel number of kernels, which is 120. Hence the output of this layer is 120 feature images of size 1 × 1.
- F6 is a fully connected layer with 84 neurons which are all connected to the output of C5.
- The output layer consists of 10 neurons corresponding to the number of classes (numbers from 0 to 9).
II. Application of LeNet for recognizing MNIST data
In this section, we apply LeNet for recognizing MNIST handwritten digit images. This network is constructed in Keras platform:
1. Loading MNIST dataset
Visualizing randomly some images in the training set:
2. Preprocessing data
This task includes the following steps:
- Reshape images into the required size of Keras
- Convert integer values into float values
- Normalize data
- One-hot encoding labels
3. Build LeNet model
LeNet_model = LeNet()
LeNet_model.summary()Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 28, 28, 6) 156
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 14, 14, 6) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 10, 10, 16) 2416
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 5, 5, 16) 0
_________________________________________________________________
flatten (Flatten) (None, 400) 0
_________________________________________________________________
dense (Dense) (None, 120) 48120
_________________________________________________________________
dense_1 (Dense) (None, 10) 1210
=================================================================
Total params: 51,902
Trainable params: 51,902
Non-trainable params: 0
_________________________________________________________________
4. Training model
train_model(LeNet_model, X_train, y_train, X_test, y_test)Epoch 1/50
468/468 [==============================] - 35s 5ms/step - loss: 1.5654 - accuracy: 0.5390 - val_loss: 36.5179 - val_accuracy: 0.9097
Epoch 2/50
468/468 [==============================] - 2s 4ms/step - loss: 0.3126 - accuracy: 0.9072 - val_loss: 26.5710 - val_accuracy: 0.9378
...................
...................
...................
Epoch 49/50
468/468 [==============================] - 2s 4ms/step - loss: 0.0249 - accuracy: 0.9927 - val_loss: 6.1983 - val_accuracy: 0.9875
Epoch 50/50
468/468 [==============================] - 2s 4ms/step - loss: 0.0262 - accuracy: 0.9922 - val_loss: 6.0475 - val_accuracy: 0.9869
313/313 [==============================] - 1s 2ms/step - loss: 6.0378 - accuracy: 0.9869
5. Prediction
Determine the confusion matrix:
Visualize randomly some images in the test set as well as their predicted labels:
III. Conclusion
We have discovered the architecture of the LeNet model and how to implement it in Keras. This model is successfully applied for classifying MNIST handwritten digit images with 98,69% of performance. In the next articles, we are going to discover some modern convolutional models as well as their applications in more complicated problems.
I hope this article is helpful for you.
Thanks for reading!
Github code for this article: https://github.com/KhuyenLE-maths/LeNet_model_with_MNIST_recognition/blob/main/LeNet_with_MNIST_recognition.ipynb
My blog page: https://lekhuyen.medium.com/
________________________________________________________________
Reference:
[1] Le Cun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., & Jackel, L. D. (1989, January). Handwritten digit recognition with a back-propagation network. In Proceedings of the 2nd International Conference on Neural Information Processing Systems (pp. 396–404).