Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Tracks with AI agents #9054

Open
PMazarovich opened this issue Feb 5, 2025 · 5 comments
Open

[Question] Tracks with AI agents #9054

PMazarovich opened this issue Feb 5, 2025 · 5 comments
Assignees
Labels
question Further information is requested

Comments

@PMazarovich
Copy link
Contributor

Hello!
Recently we've found out this https://www.cvat.ai/blog/announcing-cvat-ai-agents#what-is-a-cvat-ai-agent-
As far as I understand the process, right now it is impossible to output tracks from any model which can be mounted to the agent due to frame-by-frame approach.
So, we feed the model with images frame by frame and get annotations from this model also for each separate frame => no tracks possible
Am i right?
Thanks!

@bsekachev bsekachev added the question Further information is requested label Feb 5, 2025
@SpecLad
Copy link
Contributor

SpecLad commented Feb 5, 2025

This is correct at the moment. However, we're planning to expand agent capabilities to cover tracking as well.

@PMazarovich
Copy link
Contributor Author

@SpecLad , thanks for the answer. I think we can close this? Or you'll close it when tracks are in place?

@PMazarovich
Copy link
Contributor Author

PMazarovich commented Feb 5, 2025

@SpecLad , another question here )
Right now it is impossible to send image + annotations for this image as an input to the model. Only image is flowing into a model.
Do you think this will be supported?
Thanks!

@SpecLad
Copy link
Contributor

SpecLad commented Feb 5, 2025

I haven't considered it, though in principle it could be done. Could you explain your use case for a feature like this?

@PMazarovich
Copy link
Contributor Author

PMazarovich commented Feb 5, 2025

Sure.
Some models might benefit from inputs of users (or other models) to run pre-annotations. For instance, an OCR model might benefit from bounding box information representing the location of the text (or texts) to be extracted from the image.

Imagine a large image, where a car is visible. The task of extracting the car license plate from the full image is vastly simplified if the OCR model is given both the image data and information about the location of the plate (bbox) inside that image. For this reason, having the ability to send both image and extra data (such as bboxes) might be important.

In the above scenario, the bbox of the license plate would be created in CVAT via the UI, or potentially by another model that detects license plates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants