Skip to content

Sarvesh-369/SimCLR

Repository files navigation

Reimplementation and Analysis of SimCLR

This repository contains a PyTorch-based reimplementation of the SimCLR (A Simple Framework for Contrastive Learning of Visual Representations) paper. The project focuses on replicating the core framework and conducting a series of experiments to analyze how different components—such as backbone architecture, data augmentation, model size, and batch size—impact the quality of learned representations.

Project Overview

The primary goal of this project is to gain a deep, hands-on understanding of self-supervised contrastive learning. We explore the SimCLR framework's effectiveness and sensitivity to various hyperparameters and architectural choices. The experiments are primarily conducted on the CIFAR-10 and STL-10 datasets.

Key Experiments and Features

This project explores several key research questions through targeted experiments:

  1. Baseline Reimplementation:

    • Successfully replicated the SimCLR framework using a ResNet-50 backbone.
    • Validated performance on CIFAR-10 and STL-10, achieving competitive results with the original paper's findings under similar conditions.
  2. Architectural Ablation Studies:

    • Backbone Scaling: Investigated the impact of model capacity by training with different ResNet variants (ResNet-18, ResNet-34, ResNet-50). This analysis shows the trade-off between model size and representation quality.
    • Vision Transformer (ViT) Backbone: Replaced the CNN-based backbone with a Vision Transformer to compare the effectiveness of transformer architectures in a contrastive learning setting.
  3. Hyperparameter Sensitivity Analysis:

    • Batch Size: Conducted experiments with varying batch sizes to observe its critical role in the performance of contrastive learning, as larger batches provide more negative examples.
  4. Advanced Data Augmentation:

    • Mixup Augmentation: Integrated the Mixup technique into the SimCLR augmentation pipeline to study its effect on model robustness and generalization on downstream tasks.

Results and Key Findings

  • Baseline Performance: Our ResNet-50 baseline achieved a Top-1 accuracy of 92.5% on the downstream classification task for CIFAR-10, confirming a successful reimplementation.
  • Backbone Comparison: The Vision Transformer (ViT) backbone showed a 4% improvement in downstream accuracy compared to the ResNet-50 backbone, highlighting the strong potential of transformer-based architectures for self-supervised learning.
  • Effect of Model Size: As expected, larger ResNet models yielded better representations, though with diminishing returns and increased computational cost.
  • Impact of Mixup: The inclusion of Mixup as a data augmentation strategy led to a 2.5% increase in model robustness, demonstrating its utility in creating more challenging and informative positive pairs.

Install the required dependencies:

pip install -r requirements.txt

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors