Skip to content

treyrachall10/Hand-TV-Controller

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Roku Hand Pose Controller

This project allows a user to control a Roku Stick through computer vision and machine learning. It uses MediaPipe to detect hand landmarks which are then normalized and passed into a pre-trained Logistic Regression model to classify the current hand pose. Based on the predicted pose and its confidence score, a corresponding Roku remote command(e.g., volume, navigation, select) is triggered using HTTP requests.

Description

This system captures live video input through OpenCV and uses MediaPipe's hand tracking module to extract 3D keypoint data from a user's hand. The landmarks are normalized by converting the wrist position to the origin, flattened into a vector, and then used as input to a trained scikit-learn model that predicts the current pose.

Each recognized pose corresponds to a Roku remote command(e.g., volume, navigation, select). To avoid accidental actions, a neutral pose is required before any command is triggered.

Dependencies

  • Python 3.x
  • OpenCV
  • MediaPipe
  • NumPy
  • scikit-learn
  • curl(for Roku control via HTTP)

Instructions for Setup

  • Change model_path variable to your computers path to the model in main.py
  • Change the variable ip_address to your Roku's IP address (ex. '100.100.100.100') inside make_action() to your Roku's IP address from the utils.py file
  • On your Roku Stick's settings go to Settings > Network Access > Permissive (will not work if not allowed to be accessed through network)

Data Creation/Collection/Cleanup Process

  • Recorded 7 different videos of myself doing 7 different poses 50 times each hand (100 repetitions in total per pose):
    • Open Palm Live Hand Pose Feedback
    • Thumbs Up Hand Detection and Label
    • Thumbs Down TV Control Action Triggered
    • Point Up Gesture Prediction Example
    • Point Down Pose Classification Output
    • Okay Symbol Confidence Score Display
    • Swipe Roku Gesture Demo
  • Created data collection program to allow for data collection on keypress.
  • Manually watched every video, pressed data collection button twice at the end of every pose to try and eliminate bias. (Came out to be about 200 labeled samples for each class.)
  • Normalized each keypoint (x, y, z) by subtracting the wrist keypoint of that current sample from each keypoint.

Possible Improvements

(If someone wants to improve please do!)

  • Add feature that automatically gets users' Roku IP address
  • Model struggles to detect hands from about 3 feet away.
    (Needs more training data for all poses.)
  • Model is mostly consistent but has trouble differentiating between swipe and select.
    (Currently, the user must hold up 4 fingers and tuck the thumb to the palm before swiping — otherwise, it gets classified as "select.")
    • Possible fix: add more training data or tweak parameters in the main if logic.
  • Add more gestures to support additional Roku commands
    (e.g., power on/off, home, select specific channels, etc.)

Closing Thoughts

This project was created to get hands on experience with data creation and cleanup of keypoint data gathered from pose estimation models like MediaPipe, and training a low cost AI model. This project is sort of a stepping stone as it uses components I will use for a future project. It was a fun experience and I plan to come back to it and further improve the models accuracy to make it a serious alternative to using a remote.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages