Aryan's Blog

~3 min read

CIFAR-10 Image Recognition Using Convolutional Neural Networks

A Deep Dive into Building and Analyzing CNN Models for Image Classification

Introduction

This project aims to build and evaluate an image recognition system using the CIFAR-10 dataset. CIFAR-10 is a widely-used benchmark dataset containing 60,000 labeled images across 10 classes (e.g., airplanes, cars, birds, cats). We leverage advanced techniques like Convolutional Neural Networks (CNNs), transfer learning, and hyperparameter optimization to achieve accurate image classification.

Core Framework: PyTorch
Model Architectures: Custom CNNs, Transfer Learning
Dataset: CIFAR-10 (Preprocessed into Train and Test splits)
Tools and Libraries: PyTorch, NumPy, Matplotlib, and Pandas

Technical Workflow

1. Data Loading and Preprocessing

The dataset is loaded and preprocessed in the dataset.py and preprocessing.py scripts. Preprocessing is essential for enhancing model performance and ensuring a clean pipeline. Key steps include:

Data Augmentation: Random cropping, flipping, and normalization.
Standardization: Images are normalized to have a mean of 0 and standard deviation of 1.

Example snippet for data preprocessing:


            transform = transforms.Compose([

                transforms.RandomHorizontalFlip(),

                transforms.RandomCrop(32, padding=4),

                transforms.ToTensor(),

                transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))

            ])

            train_dataset = CIFAR10(root='data', train=True, transform=transform, download=True)

2. Model Architectures

The project employs two primary approaches for building models:

Custom CNN Models

In the model.py file, custom CNNs are designed with multiple convolutional layers, max-pooling, and fully connected layers. The goal is to extract meaningful features from images through hierarchical convolutions.

Example architecture of a CNN:


            class CNN(nn.Module):

                def __init__(self):

                    super(CNN, self).__init__()

                    self.conv1 = nn.Conv2d(3, 32, kernel_size=3, padding=1)

                    self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)

                    self.pool = nn.MaxPool2d(kernel_size=2, stride=2)

                    self.fc1 = nn.Linear(64 * 8 * 8, 256)

                    self.fc2 = nn.Linear(256, 10)

                    self.dropout = nn.Dropout(0.5)


                def forward(self, x):

                    x = self.pool(F.relu(self.conv1(x)))

                    x = self.pool(F.relu(self.conv2(x)))

                    x = x.view(-1, 64 * 8 * 8)

                    x = F.relu(self.fc1(x))

                    x = self.dropout(x)

                    x = self.fc2(x)

                    return x

This model uses two convolutional layers with ReLU activation and max-pooling for dimensionality reduction, followed by fully connected layers for final predictions.

Transfer Learning

To enhance performance, transfer learning is implemented in the transfer_learning_model.py file. Pre-trained models such as ResNet and VGG are fine-tuned on the CIFAR-10 dataset to speed up training and improve accuracy.

Example for using ResNet:


            model = models.resnet18(pretrained=True)

            model.fc = nn.Linear(model.fc.in_features, 10)

            optimizer = optim.Adam(model.parameters(), lr=0.001)

By modifying the fully connected layer, the model adapts to CIFAR-10's 10 classes.

3. Hyperparameter Optimization

The trainer.py script employs hyperparameter tuning to identify the best model configuration. Parameters tuned include:

Learning Rate
Batch Size
Number of Layers
Dropout Rate

Results are logged and visualized to determine the optimal settings for training.

Model Training and Evaluation

The training pipeline involves iterating over the dataset for multiple epochs and minimizing the cross-entropy loss. Performance metrics include:

Training and Validation Accuracy
Confusion Matrix for Class-Level Analysis
Loss Curves for Monitoring Convergence

Training is executed as follows:


            for epoch in range(epochs):

                train_loss = train(model, train_loader, optimizer, criterion)

                val_loss, val_acc = evaluate(model, val_loader, criterion)

                print(f"Epoch {epoch}: Train Loss={train_loss}, Val Acc={val_acc}")

Training History: The training_history.png shows how accuracy improves over epochs, while the validation_confusion_matrix.png highlights classification performance across classes.

Key Highlights

Custom CNNs and Transfer Learning models achieve strong classification accuracy on CIFAR-10.
Hyperparameter optimization ensures an efficient training process.
Training metrics and confusion matrices provide deep insights into model performance.
Reusable, modular code for dataset handling, training, and evaluation.

Conclusion

This project demonstrates the power of CNNs and transfer learning for image classification tasks. By combining data augmentation, advanced model architectures, and rigorous evaluation, we achieve robust results on the CIFAR-10 dataset. This approach can be extended to other image recognition problems with similar techniques.

GitHub Repository

Access the complete codebase, datasets, and results here: GitHub Repository

Built with a passion for deep learning and computer vision. Explore more projects on GitHub.

Fine Tuning GPT 2 for Financial News and Analysis Text Generation

Added On: 21/12/2024 at 00:16

Author: Aryan Singh

Object Detection for Autonomous Vehicles

Added On: 27/12/2024 at 01:12

Author: Aryan Singh