Collection of papers and other resources for object detection and tracking using deep learning

Static Detection

Region Proposal
- Scalable Object Detection Using Deep Neural Networks (cvpr14) (pdf, notes)
- Selective Search for Object Recognition (ijcv2013) (pdf, notes)
RCNN
- Faster R-CNN Towards Real-Time Object Detection with Region Proposal Networks (tpami17) (pdf, notes)
- RFCN - Object Detection via Region-based Fully Convolutional Networks (nips16) (pdf, notes) [Microsoft Research]
- Mask R-CNN (iccv17) (pdf, (notes, arxiv, code (keras), code (tensorflow)) [Facebook AI Research]
YOLO
- You Only Look Once Unified, Real-Time Object Detection (ax1605) (pdf, notes)
- YOLO9000 Better, Faster, Stronger (ax1612) (pdf, notes)
- YOLOv3 An Incremental Improvement (ax1804) (pdf, notes)
SSD
- SSD Single Shot MultiBox Detector (ax1612/eccv16) (pdf, notes)
- DSSD Deconvolutional Single Shot Detector (ax1701) (pdf, notes)
RetinaNet
- Feature Pyramid Networks for Object Detection (ax1704) (pdf, notes)
- Focal Loss for Dense Object Detection (ax180207/iccv17) (pdf, notes)
Misc
- OverFeat Integrated Recognition, Localization and Detection using Convolutional Networks (ax1402/iclr14) (pdf, notes)
- LSDA Large scale detection through adaptation (ax1411/nips14) (pdf, notes)

Video Detection

Tubelet
- Object Detection from Video Tubelets with Convolutional Neural Networks (cvpr16) (pdf, notes)
- Object Detection in Videos with Tubelet Proposal Networks (ax1704/cvpr17) (pdf, notes)
FGFA
- Deep Feature Flow for Video Recognition (cvpr17) (pdf, arxiv, code) [Microsoft Research]
- Flow-Guided Feature Aggregation for Video Object Detection (ax1708/iccv17) (pdf, notes)
- Towards High Performance Video Object Detection (ax1711) (Microsoft) (pdf, notes)
RNN
- Online Video Object Detection using Association LSTM (iccv17) (pdf, notes)
- Context Matters Reﬁning Object Detection in Video with Recurrent Neural Networks (bmvc16) (pdf, notes)

Multi Object Tracking

Deep Learning
- Tracking The Untrackable: Learning To Track Multiple Cues with Long-Term Dependencies (ax1704/iccv17) (Stanford) (pdf, arxiv, project page, notes)
Reinforcement Learning
- Learning to Track: Online Multi-object Tracking by Decision Making (iccv15) (Stanford) (pdf, code (Matlab), project page, notes)
Network Flow
- Near-Online Multi-target Tracking with Aggregated Local Flow Descriptor (iccv15) (NEC Labs) (pdf, author page, notes)
- Deep Network Flow for Multi-Object Tracking (cvpr17) (NEC Labs) (pdf, supplementary, notes)
Graph Optimization
- A Multi-cut Formulation for Joint Segmentation and Tracking of Multiple Objects (arxiv July 2016) (highest MT on MOT2015) (University of Freiburg, Germany) (pdf, arxiv, author page, notes)
Baseline
- Simple Online and Realtime Tracking (icip16) (pdf, notes, code)
- High-Speed Tracking-by-Detection Without Using Image Information (avss17) (pdf, notes, code)

Single Object Tracking

Reinforcement Learning
- Deep Reinforcement Learning for Visual Object Tracking in Videos (arxiv April 2017) (USC-Santa Barbara, Samsung Research) (pdf, arxiv, author page, notes)
- Visual Tracking by Reinforced Decision Making (arxiv February 2017) (Seoul National University, Chung-Ang University) (pdf, arxiv, author page, notes)
- Action-Decision Networks for Visual Tracking with Deep Reinforcement Learning (cvpr17) (Seoul National University) (pdf, supplementary, project page, notes)
- End-to-end Active Object Tracking via Reinforcement Learning (arxiv 30 May 2017) (Peking University, Tencent AI Lab) (pdf, arxiv)
Siamese
- High Performance Visual Tracking with Siamese Region Proposal Network [cvpr18] [pdf] [author] [notes]

Deep Learning

Do Deep Nets Really Need to be Deep (NIPS 2014) (pdf, notes)
Synthetic Gradients
- Decoupled Neural Interfaces using Synthetic Gradients (arxiv August 2016) (pdf, notes)
- Understanding Synthetic Gradients and Decoupled Neural Interfaces (arxiv March 2017) (pdf, notes)

Unsupervised Learning

Learning Features by Watching Objects Move (cvpr17) (pdf, notes)

Interpolation

Video Frame Interpolation via Adaptive Convolution (cvpr17 / iccv17) (pdf (cvpr17), (pdf (iccv17), ppt)

Autoencoder

Variational
- beta-VAE Learning Basic Visual Concepts with a Constrained Variational Framework iclr17 (pdf, notes)
- Disentangling by Factorising ax1806 (pdf, notes)

Datasets

Multi Object Tracking
- IDOT
- UA-DETRAC Benchmark Suite
- GRAM Road-Traffic Monitoring
- Stanford Drone Dataset
- Ko-PER Intersection Dataset
- TRANCOS
- Urban Tracker
- DARPA VIVID / PETS 2005 (Non stationary camera)
- KIT-AKS (No ground truth)
- CBCL StreetScenes Challenge Framework (No top down viewpoint)
- MOT 2015 (mostly street level camera viewpoint)
- MOT 2016 (mostly street level camera viewpoint)
- MOT 2017 (mostly street level camera viewpoint)
- PETS 2009 (No vehicles)
- PETS 2017 (Low density; mostly pedestrians)
- KITTI Tracking Dataset (No top down viewpoint; non stationary camera)
- The WILDTRACK Seven-Camera HD Dataset (pedestrian detection and tracking)
- 3D Traffic Scene Understanding from Movable Platforms (intersection traffic/stereo setup/moving camera)
Video Understanding / Activity Recognition
Video Detection
- YouTube-BB
Static Detection
- Object Detection-based annotations for some frames of the VIRAT dataset
Static Segmentation
- COCO - Common Objects in Context
- UC Berkeley Computer Vision Group - Contour Detection and Image Segmentation
Video Segmentation
- DAVIS: Densely Annotated VIdeo Segmentation
Classification
Optical Flow
- Middlebury
- MPI Sintel

Collections

Datasets
- List of traffic surveillance datasets
Single Object Tracking
Multi Object Tracking
- List of multi object tracking papers
- A collection of Multiple Object Tracking (MOT) papers in recent years, with notes
Deep Compressed Sensing
- Reproducible Deep Compressive Sensing
Misc
- List of Matlab frameworks, libraries and software
- Face Recognition

Tutorials

Static Detection
Video Detection
- How Microsoft Does Video Object Detection - Unifying the Best Techniques in Video Object Detection Architectures in a Single Model
Deep RL
- Deep Reinforcement Learning: Pong from Pixels
- Demystifying Deep Reinforcement Learning
Autoencoders

Code

Multi Object Tracking
- Globally-optimal greedy algorithms for tracking a variable number of objects [cvpr11] [matlab] [author]
- Continuous Energy Minimization for Multitarget Tracking [cvpr11 / iccv11 / tpami 2014] [matlab]
- Discrete-Continuous Energy Minimization for Multi-Target Tracking [cvpr12] [matlab] [project]
- The way they move: Tracking multiple targets with similar appearance [iccv13] [matlab]
- 3D Traffic Scene Understanding from Movable Platforms [2d_tracking] [pami14/kit13/iccv13/nips11] [C++/matlab]
- Multiple target tracking based on undirected hierarchical relation hypergraph [cvpr14] [C++] [author]
- Robust online multi-object tracking based on tracklet confidence and online discriminative appearance learning [cvpr14] [matlab] (project)
- Learning to Track: Online Multi-Object Tracking by Decision Making [iccv15] [matlab]
- Joint Tracking and Segmentation of Multiple Targets [cvpr15] [matlab]
- Multiple Hypothesis Tracking Revisited [iccv15] [highest MT on MOT2015 among open source trackers] [matlab]
- Simple Online and Realtime Tracking [icip 2016] [python]
- Deep SORT : Simple Online Realtime Tracking with a Deep Association Metric [icip 2017] [python]
- Combined Image- and World-Space Tracking in Traffic Scenes [icra 2017] [c++]
- High-Speed Tracking-by-Detection Without Using Image Information [avss 2017] [python]
Single Object Tracking
- A collection of common tracking algorithms (2003-2012) [c++/matlab]
- SenseTime Research platform for single object tracking, implementing algorithms like SiamRPN and SiamMask [pytorch]
- In Defense of Color-based Model-free Tracking [cvpr15] [c++]
- Hierarchical Convolutional Features for Visual Tracking [iccv15] [matlab]
- Visual Tracking with Fully Convolutional Networks [iccv15] [matlab]
- DeepTracking: Seeing Beyond Seeing Using Recurrent Neural Networks [aaai 2016] [torch 7]
- Learning Multi-Domain Convolutional Neural Networks for Visual Tracking [cvpr16] [vot2015 winner] [matlab/matconvnet]
- Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking [eccv 2016] [matlab]
- Fully-Convolutional Siamese Networks for Object Tracking [eccvw 2016] [matlab/matconvnet] [project]
- DCFNet: Discriminant Correlation Filters Network for Visual Tracking [arxiv1704] [matlab/matconvnet] [pytorch]
- End-to-end representation learning for Correlation Filter based tracking [cvpr17] [matlab/matconvnet] [tensorflow/inference_only] [project]
- RATM: Recurrent Attentive Tracking Model [cvprw17] [python]
- ROLO : Spatially Supervised Recurrent Convolutional Neural Networks for Visual Object Tracking [iscas 2017] [tensorfow]
- ECO: Efficient Convolution Operators for Tracking [cvpr17] [matlab]
- Detect to Track and Track to Detect [iccv17] [matlab]
- High Performance Visual Tracking with Siamese Region Proposal Network [cvpr18] [pytorch] [pytorch/reimplementation]
- Distractor-aware Siamese Networks for Visual Object Tracking [eccv18] [vot18 winner] [pytorch]
- Fast Online Object Tracking and Segmentation: A Unifying Approach [cvpr19] [pytorch] [project]
Video Detection
- Flow-Guided Feature Aggregation for Video Object Detection [nips 2016 / iccv17] [python/cuda]
- T-CNN: Tubelets with Convolution Neural Networks [cvpr16] [python]
- TPN: Tubelet Proposal Network [cvpr17] [python]
- Mobile Video Object Detection with Temporally-Aware Feature Maps [cvpr18] [Google] [tensorflow]
Static Detection and Matching
- Frameworks
- Region Proposal
  - MCG : Multiscale Combinatorial Grouping - Object Proposals and Segmentation (project) [tpami16/cvpr14] [python]
  - COB : Convolutional Oriented Boundaries (project) [tpami18/eccv16] [matlab/caffe]
- FPN
  - Feature Pyramid Networks for Object Detection [caffe/python]
- RCNN
  - RFCN (author) [caffe/matlab]
  - RFCN-tensorflow [tensorflow]
  - PVANet: Lightweight Deep Neural Networks for Real-time Object Detection
  - Mask R-CNN - TensorFlow, Keras
  - Light-head R-CNN [cvpr18] [TensorFlow]
  - Evolving Boxes for Fast Vehicle Detection [icme18] [Caffe/Python]
  - Cascade R-CNN (cvpr18) - Detectron, Caffe
- SSD
  - SSD-Tensorflow [tensorflow]
  - SSD-Tensorflow (tf.estimator) [tensorflow]
  - SSD-Tensorflow (tf.slim) [tensorflow]
  - SSD-Keras [keras]
  - SSD-Pytorch [pytorch]
  - Enhanced SSD with Feature Fusion and Visual Reasoning [NCA18] [TensorFlow]
  - RefineDet - Single-Shot Refinement Neural Network for Object Detection [cvpr18] [caffe]
- YOLO
  - Darknet: Convolutional Neural Networks [c/python]
  - YOLO9000: Better, Faster, Stronger - Real-Time Object Detection. 9000 classes! [c/python]
  - Darkflow [tensorflow]
  - Pytorch Yolov2 [pytorch]
  - Yolo-v3 and Yolo-v2 for Windows and Linux [c/python]
  - YOLOv3 in PyTorch [pytorch]
  - pytorch-yolo-v3 [pytorch] [no training] [tutorial]
  - YOLOv3_TensorFlow [tensorflow]
  - tensorflow-yolo-v3 [tensorflow slim]
  - tensorflow-yolov3 [tensorflow slim]
  - keras-yolov3 [keras]
- Relation Networks for Object Detection [cvpr18] [MXNet]
- DeNet: Scalable Real-time Object Detection with Directed Sparse Sampling [iccv17(poster)] [theano]
- SNIPER: Efficient Multi-Scale Training [cvpr18 / nips18] [mxnet]
- Multi-scale Location-aware Kernel Representation for Object Detection [cvpr18] [caffe/python]
- Matching
  - Matchnet
  - Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches
- Boundary Detection
  - Holistically-Nested Edge Detection (HED) (iccv15) [caffe]
  - Edge-Detection-using-Deep-Learning (HED) [tensorflow]
  - Crisp Boundary Detection Using Pointwise Mutual Information (eccv14) [matlab]
Optical Flow
- FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks (cvpr17) - caffe, pytorch/nvidia
- SPyNet: Spatial Pyramid Network for Optical Flow (cvpr17) - lua, pytorch
- Guided Optical Flow Learning (cvprw17) - caffe, tensorflow
- Fast Optical Flow using Dense Inverse Search (DIS) [eccv16] [C++]
- A Filter Formulation for Computing Real Time Optical Flow [ral16] [c++/cuda - matlab,python wrappers]
- PatchBatch - a Batch Augmented Loss for Optical Flow [cvpr16] [python/theano]
- Piecewise Rigid Scene Flow [iccv13/eccv14/ijcv15] [c++/matlab]
- DeepFlow v2 (iccv13) - c++/python/matlab, project
- An Evaluation of Data Costs for Optical Flow [gcpr13] [matlab]
Instance Segmentation
- Fully Convolutional Instance-aware Semantic Segmentation [cvpr17] [coco16 winner] [mxnet]
Autoencoders
- β-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework [iclr 2017] [deepmind] [tensorflow] [tensorflow] [pytorch]
- Disentangling by Factorising [arxiv 2018/06] [pytorch]
Classification
- Learning Efficient Convolutional Networks Through Network Slimming [iccv17] [pytorch]
Deep RL
- Asynchronous Methods for Deep Reinforcement Learning
Misc
- Deformable Convolutional Networks
- RNNexp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ReadMe.md

ReadMe.md

Static Detection

Video Detection

Multi Object Tracking

Single Object Tracking

Deep Learning

Unsupervised Learning

Interpolation

Autoencoder

Datasets

Collections

Tutorials

Code

Files

ReadMe.md

Latest commit

History

ReadMe.md

File metadata and controls

Static Detection

Video Detection

Multi Object Tracking

Single Object Tracking

Deep Learning

Unsupervised Learning

Interpolation

Autoencoder

Datasets

Collections

Tutorials

Code