Replies: 3 comments
-
Hey @Rat228420, I worked on a similar project For your goals, I recommend using object detection (like YOLOv8) for elements such as traffic lights, and segmentation (like DeepLabV3) for detecting pedestrian paths or stairs. You can also combine both with a multi-task model. Useful datasets:
To combine camera and GPS, look into basic sensor fusion or just use GPS for route guidance and visual data to validate surroundings. Start with basic features like detecting traffic lights, then expand. Tools like Detectron2 and Label Studio are also very helpful. Good luck. |
Beta Was this translation helpful? Give feedback.
-
hi , @fawern Right now, I’m having trouble understanding the model’s output. It has the shape (1, 84, 8400), and each bounding box is predicting something different. Also, the raw logits from index 4 to 83 in each bounding box are strangely high. Have you worked on a similar project or know how to fix this? Any advice would be greatly appreciated! Thanks again! |
Beta Was this translation helpful? Give feedback.
-
Hello @Rat228420 again, I’ve worked with YOLOv8 before,
So index 0–3 are bounding box coordinates, index 4 is objectness confidence, index 5–83 are the class probabilities. Apply sigmoid activation to the raw outputs (YOLOv8 expects post-processing). Bounding box coordinates are usually fine. Objectness score and class scores need to be passed through a sigmoid. here is a bsic python script: import torch
output = model_output.squeeze(0) # shape will be (84, 8400)
objectness = torch.sigmoid(output[4, :])
class_scores = torch.sigmoid(output[5:, :].transpose(0, 1)) # (8400, 79) If you need any more help, feel free to ask |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hey everyone!
I'm currently working on a project for my bachelors degree, and I'd love to get some feedback, guidance
The idea is to develop an iOS navigation app for blind users, which uses the phone’s camera and machine learning to analyze the environment in real time, and provide audio guidance to help users navigate safely.
How it works (in theory):
The app will use the phone camera to detect important objects and features in the environment — things like:
Pedestrian paths
Traffic lights
Dangerous areas (e.g., stairs, escalators)
Braille signs/text (if possible)
It will also combine this visual input with the user's GPS location to help guide them toward their destination.
I’m still defining what the ML model should be able to recognize and how to structure everything. Right now, I’m trying to figure out:
Should I use object detection, image segmentation, or a combination of both?
How can I find or build datasets for training?
Is this realistic as a single project, or should I drop some features for now?
If anyone has experience with similar projects, or knows good resources, tutorials, or datasets — I’d be super grateful for any tips or advice!
Thanks a lot in advance!
Beta Was this translation helpful? Give feedback.
All reactions