Alexnet and image classification

6 min readMar 10, 2021

Alexnet is a convolutional neural network that was designed by Alex Krizhevsky, in collaboration with Ilya Sutskever and Geoffrey Hinton. In ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2010, this network was trained to classify 1.2 million high-resolution images into 1000 different classes. It achieved top-1 and top-5 error rates of 37.5% and 17%, which outperforms state-of-the-art methods at that time.

The design of Alexnet and LeNet are very similar, but Alexnet is much deeper with more filters per layer. It consists of eight layers: five convolutional layers (some of them are followed by max-pooling layers), two fully connected hidden layers, and one fully connected output layer. Especially, training this network is also feasible on multiple GPUs. In ILSVRC 2012, a variant of this model was trained with the contribution of some techniques to avoid overfitting as data augmentation, dropout. Consequently, it won the competition with a top-5 test error rate of 15.3% was achieved.

In this article, we are going to discover the architecture of this network, as well as its implementation on the Keras platform. We further apply this network to the problem of classifying dog and cat images.

I. Architecture

The architecture of Alexnet is described by the following figure [1]:

Figure 1: The architecture of Alexnet model. Source [1]

In the first layer, a convolutional window of size 11 × 11 is used. It is because of input size is large, so we need to use a large kernel to capture the object. The convolutional window shape in the next layers is reduced gradually to 5× 5 and 3 × 3, but the number of filters is increased in parallel. The two first and the last convolutional layers are followed by max-pooling layers, where a pooling window of size 3 × 3 and a stride of 2 steps are applied. Hence, the output size is halved throughout these pooling layers.

In this model, the ReLU activation function was applied [1]. You can see my previous post for more detail about this function. Besides, the authors also used some techniques to reduce overfitting phenomena, such as data augmentation and dropout. For more detail, the dropout technique was applied in the two first fully connected layers with a dropping ratio of 50%.

Implementation of Alexnet model on Keras:

Summarize the model:

Alexnet_model = Alexnet()
Alexnet_model.summary()Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 55, 55, 96)        34944     
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 27, 27, 96)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 27, 27, 256)       614656    
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 13, 13, 256)       0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 13, 13, 384)       885120    
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 13, 13, 384)       1327488   
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 13, 13, 256)       884992    
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 6, 6, 256)         0         
_________________________________________________________________
flatten (Flatten)            (None, 9216)              0         
_________________________________________________________________
dense (Dense)                (None, 4096)              37752832  
_________________________________________________________________
dropout (Dropout)            (None, 4096)              0         
_________________________________________________________________
dense_1 (Dense)              (None, 4096)              16781312  
_________________________________________________________________
dropout_1 (Dropout)          (None, 4096)              0         
_________________________________________________________________
dense_2 (Dense)              (None, 1000)              4097000   
=================================================================
Total params: 62,378,344
Trainable params: 62,378,344
Non-trainable params: 0
_________________________________________________________________

This model has more than 62 million parameters.

III. Application of Alexnet model for classifying dog and cat images

The dataset is collected from Kaggle, this data consists of:

A training set that includes 4006 dog images and 4001 cat images.
A testing set that includes 1013 dog images and 1012 cat images.

1. Data loading and exploration

a. Display the number of dog and cat images in both training and testing sets:

>>> The training set consists of  4006  dog images and  4001  cat images.

>>> The test set consists of  1013  dog images and  1012  cat images.

b. Load images and labels

Load training and testing sets:

Because this loading takes a lot of time, hence you should save them for later using:

Save X_train, y_train, X_test, y_test into a dictionary, namely data_dict:

Load X_train, y_train, X_test, y_test:

Display randomly some images of the training set:

2. Data preprocessing

This task consists of the following steps:

Convert integer values into floats
Normalization
One-hot encoding the labels

Visualize randomly some images of the training set after preprocessing:

3. Define Alexnet model

Remark that in our problem, we only have two categories, dog and cat. Hence, the output of Alexnet model needs to be adjusted to fit the problem. For more detail, the neuron number in the output layer, in this case, is two.

Libraries:

Create the model:

Summary of the new model:

Alexnet_model = Alexnet()
Alexnet_model.summary()Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_5 (Conv2D)            (None, 55, 55, 96)        34944     
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 27, 27, 96)        0         
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 27, 27, 256)       614656    
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 13, 13, 256)       0         
_________________________________________________________________
conv2d_7 (Conv2D)            (None, 13, 13, 384)       885120    
_________________________________________________________________
conv2d_8 (Conv2D)            (None, 13, 13, 384)       1327488   
_________________________________________________________________
conv2d_9 (Conv2D)            (None, 13, 13, 256)       884992    
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 6, 6, 256)         0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 9216)              0         
_________________________________________________________________
dense_3 (Dense)              (None, 4096)              37752832  
_________________________________________________________________
dropout_2 (Dropout)          (None, 4096)              0         
_________________________________________________________________
dense_4 (Dense)              (None, 4096)              16781312  
_________________________________________________________________
dropout_3 (Dropout)          (None, 4096)              0         
_________________________________________________________________
dense_5 (Dense)              (None, 2)                 8194      
=================================================================
Total params: 58,289,538
Trainable params: 58,289,538
Non-trainable params: 0
_________________________________________________________________

4. Train the model

We apply the data augmentation technique to the training set to reduce overfitting. It includes image rotation of 5 degrees, width shift range of 10%, height shift range of 10%, and horizontal flip. These techniques can be affected by ImageDataGenerator function from the keras.preprocessing.image module.

Define the training function:

Epoch 1/100
62/62 [==============================] - 117s 2s/step - loss: 0.6932 - accuracy: 0.5027 - val_loss: 0.6914 - val_accuracy: 0.5230
Epoch 2/100
62/62 [==============================] - 116s 2s/step - loss: 0.6911 - accuracy: 0.5344 - val_loss: 0.6877 - val_accuracy: 0.5245
................
................
................
Epoch 99/100
62/62 [==============================] - 116s 2s/step - loss: 0.0286 - accuracy: 0.9894 - val_loss: 0.3615 - val_accuracy: 0.9086
Epoch 100/100
62/62 [==============================] - 116s 2s/step - loss: 0.0291 - accuracy: 0.9885 - val_loss: 0.3721 - val_accuracy: 0.9095
64/64 [==============================] - 5s 77ms/step - loss: 0.3721 - accuracy: 0.9095
90.954

Save the training history and the accuracy:

Save the trained model:

Visualize the accuracies on both training and testing sets during training the model:

We see that the accuracies on both sets tend to increase (up to 98.85% for the training set and 90.95 % for the testing set) as the number of epochs increases. These values can still be improved with a higher number of epochs.

5. Prediction

Determine the confusion matrix:

This matrix describes the number of images that are correctly or incorrectly classified in each class. Based on this matrix, we see that:

900 dog images and 940 cat images are classified correctly in their classes.
112 dog images are classified into the “cat” class.
71 cat images are classified into the “dog” class.

Visualize some images and their predicted classes:

III. Conclusion

We have discovered the architecture of the Alexnet model and its implementation on the Keras platform. This model is applied for classifying dog and cat images with a performance of 90.954 % in the testing set is achieved. However, this performance can still be improved by getting more training data, trying a higher number of epochs, changing the hyperparameters, and so on. Besides, there are also some other techniques to improve the model in each particular case, they will be introduced detailly in the next article.

I hope this article is helpful for you, don’t hesitate to find me on medium to discover similar content.

Thanks for reading!

Github code: https://github.com/KhuyenLE-maths/Alexnet_model_with_image_classification/blob/main/Alexnet_and_image_classification.ipynb

My blog page: https://lekhuyen.medium.com/

________________________________________________________________

References:

[1] Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25, 1097–1105.