A Deep Dive into Building and Analyzing CNN Models for Image Classification
This project aims to build and evaluate an image recognition system using the CIFAR-10 dataset. CIFAR-10 is a widely-used benchmark dataset containing 60,000 labeled images across 10 classes (e.g., airplanes, cars, birds, cats). We leverage advanced techniques like Convolutional Neural Networks (CNNs), transfer learning, and hyperparameter optimization to achieve accurate image classification.
The dataset is loaded and preprocessed in the dataset.py
and
preprocessing.py
scripts. Preprocessing is essential for enhancing
model performance and ensuring a clean pipeline. Key steps include:
Example snippet for data preprocessing:
transform = transforms.Compose([
transforms.RandomHorizontalFlip(),
transforms.RandomCrop(32, padding=4),
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
train_dataset = CIFAR10(root='data', train=True, transform=transform, download=True)
The project employs two primary approaches for building models:
In the model.py
file, custom CNNs are designed with multiple convolutional layers,
max-pooling, and fully connected layers. The goal is to extract meaningful features from images
through hierarchical convolutions.
Example architecture of a CNN:
class CNN(nn.Module):
def __init__(self):
super(CNN, self).__init__()
self.conv1 = nn.Conv2d(3, 32, kernel_size=3, padding=1)
self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
self.fc1 = nn.Linear(64 * 8 * 8, 256)
self.fc2 = nn.Linear(256, 10)
self.dropout = nn.Dropout(0.5)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 64 * 8 * 8)
x = F.relu(self.fc1(x))
x = self.dropout(x)
x = self.fc2(x)
return x
This model uses two convolutional layers with ReLU activation and max-pooling for dimensionality reduction, followed by fully connected layers for final predictions.
To enhance performance, transfer learning is implemented in the
transfer_learning_model.py
file. Pre-trained models such as ResNet and VGG are
fine-tuned on the CIFAR-10 dataset to speed up training and improve accuracy.
Example for using ResNet:
model = models.resnet18(pretrained=True)
model.fc = nn.Linear(model.fc.in_features, 10)
optimizer = optim.Adam(model.parameters(), lr=0.001)
By modifying the fully connected layer, the model adapts to CIFAR-10's 10 classes.
The trainer.py
script employs hyperparameter tuning to identify the best model
configuration. Parameters tuned include:
Results are logged and visualized to determine the optimal settings for training.
The training pipeline involves iterating over the dataset for multiple epochs and minimizing the cross-entropy loss. Performance metrics include:
Training is executed as follows:
for epoch in range(epochs):
train_loss = train(model, train_loader, optimizer, criterion)
val_loss, val_acc = evaluate(model, val_loader, criterion)
print(f"Epoch {epoch}: Train Loss={train_loss}, Val Acc={val_acc}")
Training History: The training_history.png
shows how accuracy
improves over epochs, while the validation_confusion_matrix.png
highlights
classification performance across classes.
This project demonstrates the power of CNNs and transfer learning for image classification tasks. By combining data augmentation, advanced model architectures, and rigorous evaluation, we achieve robust results on the CIFAR-10 dataset. This approach can be extended to other image recognition problems with similar techniques.
Access the complete codebase, datasets, and results here: GitHub Repository
Built with a passion for deep learning and computer vision. Explore more projects on GitHub.