This project provides comprehensive documentation and examples for training various YOLO model v## Features
- Multi-Model Support: YOLOv8, YOLOv5, and YOLO11 training
- Smart Configuration: Automatic dataset preparation and configuration
- Interactive Training: Step-by-step guided setup for beginners
- GPU Optimization: Automatic GPU memory management with corrected estimation formulas (Sept 2025)
- Comprehensive Monitoring: TensorBoard integration with real-time metrics
- Checkpoint Management: Automatic saving and resuming of training sessions
- Export Pipeline: Convert trained models to multiple formats (ONNX, TorchScript, etc.)
- Validation Tools: Comprehensive model evaluation and testing
- Memory Management: Automatic GPU memory cleanup and optimization*zero dataset preparation required**. The automated dataset system handles any dataset format and structure automatically.
# 1. Place ANY dataset in dataset/ folder (any structure/format)
# 2. Run training - everything happens automatically!
# If using virtual environment:
.venv/bin/python train.py # Linux/Mac
# .venv\Scripts\python.exe train.py # Windows
# Or with system Python:
python train.py
The system automatically:
- Detects dataset structure (flat, nested, mixed)
- Converts any format (YOLO, COCO, XML, custom)
- Reorganizes to YOLO standard
- Generates
data.yaml
configuration - Starts training immediately
Supported Sources:
- Roboflow exports (any format)
- Kaggle datasets
- Custom annotations
- Mixed sources
- Any organization structure
NEW: QUICK_START_GUIDE.md - Get training in minutes with your dataset!
# Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate # Linux/Mac
# or
.venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt
export ROBOFLOW_API_KEY="your_api_key_here"
# Option A: Automatic (Recommended)
# Just place your dataset in dataset/ folder and run training!
# Option B: Manual preparation (if needed)
python utils/prepare_dataset.py dataset/ --format yolov8
# Option C: Roboflow export
from roboflow import Roboflow
rf = Roboflow(api_key="your_api_key")
project = rf.workspace("workspace").project("project_id")
dataset = project.version("version_number").download("yolov8")
Note: Replace python
with .venv/bin/python
(Linux/Mac) or .venv\Scripts\python.exe
(Windows) if using virtual environment.
python train.py
The system will guide you through:
- YOLO version selection (YOLO11, YOLOv8, YOLOv5)
- Model size selection (nano to xlarge)
- Training parameters (epochs, batch size, image size)
- Advanced options and results folder naming
- Automatic TensorBoard launch for real-time monitoring
python train.py --model-type yolov8
Uses specified YOLO version, prompts for other parameters
python train.py --model-type yolov8 --non-interactive --results-folder my_experiment
Uses all defaults, creates organized results folder
python train.py \
--model-type yolov8 \
--epochs 100 \
--batch-size 16 \
--image-size 640 \
--results-folder custom_run
Important: Replace python
with your virtual environment path if using one:
- Linux/Mac:
.venv/bin/python
- Windows:
.venv\Scripts\python.exe
--image-size 640
--device cuda
--results-folder production_run
--non-interactive
### 5. Monitor Training with TensorBoard
**Automatic Monitoring** (Recommended):
- TensorBoard launches automatically during training
- Opens in your browser with real-time metrics
- Continues running after training for result analysis
**Manual TensorBoard Management**:
```bash
# Check TensorBoard status and open in browser
python -m utils.tensorboard_manager status
# List all experiments with TensorBoard data
python -m utils.tensorboard_manager list
# Launch TensorBoard for specific experiment
python -m utils.tensorboard_manager launch experiment_name
# Stop TensorBoard when done
python -m utils.tensorboard_manager stop
# Test the automated dataset system
python tests/test_auto_dataset.py
# Run comprehensive YOLO testing
python tests/test_comprehensive_yolo.py
# Run standard tests
python -m pytest tests/ -v
- Structure Detection: Automatically identifies flat, nested, or mixed dataset structures
- Format Conversion: Handles YOLO, COCO, XML, and custom annotation formats
- Class Detection: Automatically detects classes from labels, annotations, or mapping files
- Split Management: Creates optimal train/validation/test splits automatically
- Automatic Organization: Converts any structure to YOLO standard
- Smart Validation: Detects and reports dataset issues
- Error Recovery: Handles corrupted files and missing labels gracefully
- YOLO Compatibility: Works with YOLOv8, YOLOv5, and YOLO11
- Comprehensive Testing: 100% test coverage for all YOLO versions
- Error Handling: Robust error handling and recovery
- Performance Optimized: Fast dataset preparation and validation
- Integration Ready: Seamlessly integrates with training pipeline
The system includes intelligent GPU memory management to prevent CUDA out-of-memory errors:
- Automatic memory estimation before training starts
- Safety warnings for risky configurations
- Memory cleanup after training completion
- Emergency recovery from out-of-memory errors
# Check memory status
python gpu_memory_cli.py status
# Test if configuration will fit before training
python gpu_memory_cli.py check --model l --image-size 1280 --batch-size 4 --version yolov8
# Clear GPU memory if needed
python gpu_memory_cli.py clear
# Monitor memory during training
python gpu_memory_cli.py monitor
If you get CUDA out-of-memory errors:
- Reduce batch size: Try
--batch-size 4
or--batch-size 2
- Reduce image size: Try
--image-size 640
instead of--image-size 1280
- Use smaller model: Try YOLOv8n or YOLOv8s instead of YOLOv8l
- QUICK_START_GUIDE.md - NEW! Get training in minutes
- docs/README.md - Main documentation hub
- docs/workflow/README.md - Comprehensive workflow documentation
- Start here: docs/README.md - Main documentation hub
- System overview: docs/workflow/01-system-overview/01-system-overview.md
- Training workflows: docs/workflow/04-integration-workflows/01-training-workflows.md
The documentation covers ALL files in ALL repository directories:
- System Overview - What the system does and why it matters
- Core Components - Main training script, configuration, utilities, dataset system
- Supporting Systems - Examples, testing, export, environment setup
- Integration Workflows - Complete training processes, data flow, error handling
- Validation & Testing - Quality assurance and maintenance procedures
- examples/export_dataset.py - Practical export script
Version | Status | Export Format | Training Method |
---|---|---|---|
YOLO11 | New Latest | yolo11 |
Repository/Ultralytics |
YOLOv8 | Recommended | yolov8 |
Ultralytics |
YOLOv5 | Stable | yolo |
Repository/Ultralytics |
YOLOv6 | Limited | yolo |
Repository |
YOLOv7 | Limited | yolo |
Repository |
YOLOv9 | Experimental | yolo |
Repository |
# Full interactive experience - selects YOLO version and all parameters
python train.py
# Train YOLOv8 with interactive configuration (recommended for beginners)
python train.py --model-type yolov8
# Train YOLOv8 with custom parameters (no prompts)
python train.py --model-type yolov8 --epochs 200 --batch-size 16 --image-size 640
# Train with custom results folder (no folder prompt)
python train.py --model-type yolov8 --results-folder experiment_2024
# Skip all interactive prompts (use defaults)
python train.py --model-type yolov8 --non-interactive
# Resume training from checkpoint
python train.py --model-type yolov8 --resume logs/previous_run/weights/last.pt
# Validate only (no training)
python train.py --model-type yolov8 --validate-only
# Export model after training
python train.py --model-type yolov8 --export
--model-type
: Choose betweenyolo11
,yolov8
,yolov5
--epochs
: Number of training epochs--batch-size
: Training batch size--image-size
: Input image size--device
: Training device (cpu
,cuda
,auto
)--results-folder
: Custom folder name for results (skips interactive prompt)--non-interactive
: Skip all interactive configuration prompts (use defaults)--resume
: Path to checkpoint for resuming training--validate-only
: Only validate, don't train--export
: Export model after training
When you run training without --non-interactive
, the system will prompt you for:
- YOLO Version: Choose between YOLO11, YOLOv8, YOLOv5 (if not specified)
- Model Size: Choose between n (nano), s (small), m (medium), l (large), x (xlarge)
- Training Duration: Number of epochs (default: 100)
- Batch Size: Training batch size (default: 8)
- Image Size: Input resolution (default: 1024)
- Learning Rate: Training learning rate (default: 0.01)
- Device: GPU or CPU training (default: cuda if available)
- Advanced Options: Early stopping patience, augmentation, validation frequency
Pro Tips:
- Press Enter to accept default values
- Use
--non-interactive
for automated scripts - Combine with
--results-folder
to skip folder naming prompt
The system includes automatic TensorBoard integration that provides real-time training visualization:
- Auto-launch: TensorBoard opens automatically in your browser
- Real-time metrics: Live loss curves, accuracy plots, and training progress
- Model visualization: Network architecture and computational graphs
- Persistent access: TensorBoard remains running after training completes
- Result analysis: Continue viewing training metrics and model performance
- Experiment comparison: Compare different training runs and experiments
- Easy management: Simple commands to control TensorBoard sessions
# Check if TensorBoard is running and open in browser
python -m utils.tensorboard_manager status
# List all experiments with their TensorBoard data status
python -m utils.tensorboard_manager list
# Launch TensorBoard for a specific experiment
python -m utils.tensorboard_manager launch experiment_name
# Launch on custom port
python -m utils.tensorboard_manager launch experiment_name --port 6007
# Stop all TensorBoard processes
python -m utils.tensorboard_manager stop
- Training Metrics: Loss curves (box, classification, DFL losses)
- Validation Metrics: mAP50, mAP50-95, precision, recall
- Model Architecture: Visual representation of YOLO network structure
- Training Images: Sample batches with augmentations and predictions
- Hyperparameters: Complete training configuration tracking
- System Metrics: GPU utilization, memory usage, training speed
If you need to manually access TensorBoard for any experiment:
# For current training
http://localhost:6006
# View experiment logs directly
tensorboard --logdir logs/experiment_name/experiment_name
- When you run training, the system prompts for a custom folder name
- Results are organized in
logs/your_custom_name/
instead of the defaultlogs/yolo_training/
- Each training run gets its own organized folder with weights, plots, and logs
- Folder names are automatically cleaned of invalid characters
- Existing folders can be reused or new names can be chosen
- Beginner-Friendly: Step-by-step prompts for all major training parameters
- Smart Defaults: Press Enter to accept recommended values
- Model Selection: Choose from nano (n) to xlarge (x) model sizes
- Parameter Guidance: Helpful explanations for each setting
- Validation: Input validation with helpful error messages
- Flexible: Use
--non-interactive
to skip prompts for automation
The system is built with enterprise-grade reliability:
- Configuration Management: Centralized config with validation and environment overrides
- Data Pipeline: Robust dataset handling with automatic validation and preprocessing
- Training Engine: Comprehensive training loop with checkpoint management and monitoring
- Evaluation System: Multi-metric evaluation with visualization and reporting
- Export Utilities: Multi-format model export (ONNX, TorchScript, CoreML, TensorRT)
- Comprehensive Testing: 98 tests covering all major components
- Error Handling: Graceful failure handling with detailed logging
- Data Validation: Automatic dataset structure and format validation
- Checkpoint Management: Robust save/load with automatic cleanup
- Real-time Monitoring: Automatic TensorBoard integration with persistent access
- Training Visualization: Live metrics, loss curves, and model performance tracking
- Python 3.8+ with pip
- Roboflow account with annotated dataset
- API key from Roboflow
- GPU (recommended for training)
pip install -r requirements.txt
pip install ultralytics roboflow torch torchvision
# Install PyTorch with CUDA support
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
pip install ultralytics roboflow
- Read docs/README.md - Main documentation hub
- Try docs/workflow/01-system-overview/01-system-overview.md - System overview
- Follow docs/workflow/04-integration-workflows/01-training-workflows.md - Training workflows
- Run the example script
- Train your first model with
python train.py
- Explore other YOLO versions
- Customize training parameters
- Experiment with different architectures
- Optimize for your use case
- Research latest YOLO versions
- Contribute to the community
- Deploy models to production
- Optimize for edge devices
- Format not found: Use
yolo
format as fallback - API key errors: Verify environment variable is set
- Permission denied: Check Roboflow project access
- Memory errors: Reduce batch size and image size
- CUDA issues: Verify PyTorch CUDA installation
- Path errors: Check
data.yaml
file paths
- Slow training: Use smaller model variants
- Low accuracy: Increase dataset size and quality
- Overfitting: Add more augmentation and regularization
# Run all tests
python -m pytest tests/ -v
# Run specific test categories
python -m pytest tests/test_config.py -v
python -m pytest tests/test_data_loader.py -v
python -m pytest tests/test_training.py -v
python demo_interactive_training.py
Shows all available training modes and options
# Interactive training with YOLO version selection
python train.py
# Non-interactive training with defaults
python train.py --model-type yolov8 --non-interactive --results-folder quick_test
# Custom configuration
python train.py --model-type yolov8 --epochs 50 --batch-size 4 --image-size 640
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- Issues: Use GitHub Issues for bug reports
- Discussions: Use GitHub Discussions for questions
- Documentation: Check the docs/ folder for detailed guides
- Examples: See the examples/ folder for practical usage
- Ultralytics team for YOLOv8 implementation
- Roboflow for dataset management tools
- PyTorch community for deep learning framework
- Contributors and users of this project