Action Recognition

This repository hosts the implementation of various action recognition models, including 3DCNN, CNN+LSTM, and CNN+LSTM+Attention. The primary focus of this repository is the CNN+LSTM+Attention model, which utilizes data obtained from pose estimation tasks for activity recognition.

Data Preparation

The data preparation process for training these models is based on the technique described in the paper "Revisiting Skeleton-based Action Recognition". This method involves generating heat maps for each joint and limb, which are then concatenated to form a comprehensive input array. This process captures the spatial relationships and dynamic movements inherent to each activity, allowing for a detailed representation of skeletal movements.

Example Heat Maps Illustration:

Heat maps showing detailed skeletal movements for two different activities.

Model Overview: CNN+LSTM+Attention

The CNN+LSTM+Attention model combines convolutional neural networks (CNNs) with long short-term memory (LSTM) units and includes an attention mechanism, creating a powerful framework for analyzing complex sequences.

Encoder

The encoder is based on a modified ResNet-152, adapted to handle multi-channel input data. Originally designed for 3 RGB channels, our encoder processes 27 channels, effectively extracting spatial features from each frame of the sequence.

LSTM Layer

The LSTM layer, with hidden dimensions of 1024, captures temporal dependencies between frames, essential for understanding the progression of actions within the sequence. Its capability to maintain a hidden state across frames enhances the model's predictive power.

Attention Mechanism

An integrated attention layer prioritizes significant frames over others, focusing the model on more relevant segments of the sequence. This feature is crucial in complex scenes where specific actions or movements carry more informational weight.

Output Layers

The sequence of features processed by the LSTM and weighted by the attention mechanism is directed through several linear layers, concluding with a softmax output. This final layer classifies the sequence into predefined categories based on the extracted features.

Simplified Model Architecture Diagram:

Diagram of the CNN+LSTM+Attention model architecture.

Training Results

The model was trained over 80 epochs with a learning rate of 1e-5 using the AdamW optimizer. The cross-entropy loss function was employed. The training results revealed high accuracy, which may be attributed to the limited diversity of the dataset.

Training Curves:

Mean accuracy and loss curves for training phases.

Thank you for exploring the Action Recognition repository.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
custom_implementation		custom_implementation
README.md		README.md
model_arch.png		model_arch.png
moving_stuff.png		moving_stuff.png
pickup_sequence.png		pickup_sequence.png
train.png		train.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Action Recognition

Data Preparation

Model Overview: CNN+LSTM+Attention

Encoder

LSTM Layer

Attention Mechanism

Output Layers

Training Results

About

Releases

Packages

Languages

saqib736/Action_Recognition

Folders and files

Latest commit

History

Repository files navigation

Action Recognition

Data Preparation

Model Overview: CNN+LSTM+Attention

Encoder

LSTM Layer

Attention Mechanism

Output Layers

Training Results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages