Roku Hand Pose Controller

This project allows a user to control a Roku Stick through computer vision and machine learning. It uses MediaPipe to detect hand landmarks which are then normalized and passed into a pre-trained Logistic Regression model to classify the current hand pose. Based on the predicted pose and its confidence score, a corresponding Roku remote command(e.g., volume, navigation, select) is triggered using HTTP requests.

Description

This system captures live video input through OpenCV and uses MediaPipe's hand tracking module to extract 3D keypoint data from a user's hand. The landmarks are normalized by converting the wrist position to the origin, flattened into a vector, and then used as input to a trained scikit-learn model that predicts the current pose.

Each recognized pose corresponds to a Roku remote command(e.g., volume, navigation, select). To avoid accidental actions, a neutral pose is required before any command is triggered.

Dependencies

Python 3.x
OpenCV
MediaPipe
NumPy
scikit-learn
curl(for Roku control via HTTP)

Instructions for Setup

Change model_path variable to your computers path to the model in main.py
Change the variable ip_address to your Roku's IP address (ex. '100.100.100.100') inside make_action() to your Roku's IP address from the utils.py file
On your Roku Stick's settings go to Settings > Network Access > Permissive (will not work if not allowed to be accessed through network)

Data Creation/Collection/Cleanup Process

Recorded 7 different videos of myself doing 7 different poses 50 times each hand (100 repetitions in total per pose):
- Open Palm
- Thumbs Up
- Thumbs Down
- Point Up
- Point Down
- Okay Symbol
- Swipe
Created data collection program to allow for data collection on keypress.
Manually watched every video, pressed data collection button twice at the end of every pose to try and eliminate bias. (Came out to be about 200 labeled samples for each class.)
Normalized each keypoint (x, y, z) by subtracting the wrist keypoint of that current sample from each keypoint.

Possible Improvements

(If someone wants to improve please do!)

Add feature that automatically gets users' Roku IP address
Model struggles to detect hands from about 3 feet away.
(Needs more training data for all poses.)
Model is mostly consistent but has trouble differentiating between swipe and select.
(Currently, the user must hold up 4 fingers and tuck the thumb to the palm before swiping — otherwise, it gets classified as "select.")
- Possible fix: add more training data or tweak parameters in the main if logic.
Add more gestures to support additional Roku commands
(e.g., power on/off, home, select specific channels, etc.)

Closing Thoughts

This project was created to get hands on experience with data creation and cleanup of keypoint data gathered from pose estimation models like MediaPipe, and training a low cost AI model. This project is sort of a stepping stone as it uses components I will use for a future project. It was a fun experience and I plan to come back to it and further improve the models accuracy to make it a serious alternative to using a remote.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
exampleimages		exampleimages
models		models
utility		utility
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Roku Hand Pose Controller

Description

Dependencies

Instructions for Setup

Data Creation/Collection/Cleanup Process

Possible Improvements

Closing Thoughts

About

Uh oh!

Releases

Packages

Languages

treyrachall10/Hand-TV-Controller

Folders and files

Latest commit

History

Repository files navigation

Roku Hand Pose Controller

Description

Dependencies

Instructions for Setup

Data Creation/Collection/Cleanup Process

Possible Improvements

Closing Thoughts

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages