A production-ready, PyTorch-based two-tower recommendation system optimized for Kaggle's free GPU resources. This project uses a subset of the Yelp dataset and employs state-of-the-art techniques for efficient training and inference.
This project implements a two-tower neural network architecture tailored for large-scale retrieval tasks. The model independently learns representations for users and items, designed to run efficiently on Kaggle's P100 GPU environment.
src/
├── data/
│ ├── raw/ # Raw Yelp dataset
│ ├── processed/ # Preprocessed data files
│ └── data_loader.py # Data loading and processing utilities
├── models/
│ ├── towers/
│ │ ├── user_tower.py # User tower architecture
│ │ └── item_tower.py # Item tower architecture
│ ├── layers/
│ │ ├── attention.py # Attention mechanisms
│ │ └── pooling.py # Pooling operations
│ └── two_tower.py # Main two-tower model
├── trainers/
│ ├── base_trainer.py # Base trainer class
│ └── two_tower_trainer.py # Two-tower model trainer
├── utils/
│ ├── metrics.py # Evaluation metrics
│ ├── losses.py # Loss functions
│ └── config.py # Configuration utilities
└── notebooks/
└── train_kaggle.ipynb # Kaggle training notebook
- Efficient Implementation: Mixed precision training (FP16), gradient checkpointing, memory-efficient embeddings, and optimized data loading.
- Flexible Model Architecture: Multi-head self-attention for user behavior, feature interaction layers, configurable tower structures.
- Production Ready: Modular design, comprehensive logging, model checkpointing, and robust configuration management.
Using a subset of the Yelp dataset:
- Rich feature set including user and business attributes, review text, and ratings
- Includes user demographics and item (business) characteristics
- Python 3.8+
- PyTorch 2.0+
- CUDA 11.0+
- Kaggle environment with P100 GPU (16GB)
git clone https://github.com/username/two-tower-rec.git
cd two-tower-rec
pip install -r requirements.txt
python src/data/data_loader.py --data_dir data/raw --output_dir data/processed
- Upload the project to Kaggle.
- Open
notebooks/train_kaggle.ipynb
. - Select GPU as the accelerator.
- Run the notebook cells to train the model.
- Input Features:
- User ID embedding, demographic features, historical behavior sequence, user context features.
- Architecture:
- Feature embedding layers, multi-head self-attention, feature interaction layer, MLP layers.
- Input Features:
- Business ID embedding, category features, business attributes, business context features.
- Architecture:
- Feature embedding layers, feature interaction layer, MLP layers.
- Batch size: 512
- Mixed precision training (FP16)
- Gradient checkpointing
- Early stopping
- Learning rate: 1e-3
- Loss: InfoNCE loss
- Optimizer: AdamW
- Negative sampling ratio: 1:4
model:
user_tower:
embedding_dim: 64
hidden_dims: [256, 128]
num_heads: 4
dropout: 0.1
item_tower:
embedding_dim: 64
hidden_dims: [256, 128]
dropout: 0.1
training:
batch_size: 512
learning_rate: 0.001
num_epochs: 30
- Mixed precision training
- Gradient checkpointing
- Efficient data loading
- Batch size optimization
- Memory-efficient embeddings
Contributions are welcome! Please submit a Pull Request if you'd like to help improve the project.
This project is licensed under the MIT License - see the LICENSE file for details.
If you use this code, please cite it as follows:
@misc{two-tower-rec,
author = {Your Name},
title = {Two-Tower Recommendation System},
year = {2024},
publisher = {GitHub},
url = {https://github.com/username/two-tower-rec}
}