Final Testing and Presentation #5

SharanyaSD · 2024-12-19T11:48:19Z

Screenshot Collection: Puppeteer captures the browser screen and sends it to the FastAPI backend.
Vision Model Prediction: The Llama Vision Model processes the screenshot and predicts: Coordinates (x, y) of the element. Action to be performed (e.g., click, scroll).
Action Execution: Puppeteer performs the predicted action in the browser.

SharanyaSD changed the title ~~Final Testing and Presentation (3 Hours)~~ Final Testing and Presentation Dec 19, 2024

Provide feedback