Skip to content

MAC-AutoML/AwesomeUIAgent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 

Repository files navigation

AwesomeUIAgent

model with grounding abilitty

Title Venue Date Code
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents arXiv 2024.1.17 njucckevin/SeeClick: The model, data and code for the visual GUI Agent SeeClick (github.com)
ShowUI: One Vision-Language-Action Model for GUI Visual Agent NeurIPS2024 Open-World Agents workshop 2024.11.26 showlab/ShowUI: Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use. (github.com)
Aria-UI: Visual Grounding for GUI Instructions arXiv 2024.12.20 AriaUI/Aria-UI: Aria-UI: Visual Grounding for GUI Instructions (github.com)
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents arXiv 2024.10.7 OSU-NLP-Group/UGround: UGround: Universal GUI Visual Grounding for GUI Agents (github.com)
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction arXiv 2024.12.5 xlang-ai/aguvis: Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction (github.com)
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents arXiv 2024.10.30 OS-Copilot/OS-Atlas: OS-ATLAS: A Foundation Action Model For Generalist GUI Agents (github.com)
CogAgent: A Visual Language Model for GUI Agents CVPR 2024 conference Highlight (top 3%). 2023.12.14;2024.12.27(v3) THUDM/CogAgent: An open-sourced end-to-end VLM-based GUI Agent (github.com)
InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection arXiv 2025.1.8 Reallm-Labs/InfiGUIAgent (github.com)
UI-TARS: Pioneering Automated GUI Interaction with Native Agents arXiv 2025.1.21 https://github.com/bytedance/UI-TARS
GUI-Bee: Align GUI Action Grounding to Novel Environments via Autonomous Exploration arXiv 2025.1.23 https://github.com/GUI-Bee/gui-bee.github.io
UI-TARS: Pioneering Automated GUI Interaction with Native Agents arXiv 2025.1.21 https://github.com/bytedance/UI-TARS
GUI-Bee : Align GUI Action Grounding to Novel Environments via Autonomous Exploration arXiv 2025.1.23

agent

Title Venue Date Code Note
Towards general computer control: A multimodal agent for red dead redemption ii as a case study ICLR 2024 Workshop on Large Language Model (LLM) Agents 2024.3.5 https://github.com/BAAI-Agents/Cradle
PPTAgent: Generating and Evaluating Presentations Beyond Text-to-Slides arXiv 2025.1.7 https://github.com/icip-cas/PPTAgent None GUI Grounding

Agent framework

Title Venue Date Code
Eko (Eko Keeps Operating) - Build Production-ready Agentic Workflow with Natural Language https://github.com/FellouAI/eko

Evaluation-agent

Title Venue Date Code
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments NeurIPS 2024 2024.4.11 [xlang-ai/OSWorld: NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments (github.com)
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale arXiv 2024.9.12 microsoft/WindowsAgentArena: Windows Agent Arena (WAA) 🪟 is a scalable OS platform for testing and benchmarking of multi-modal AI agents. (github.com)

Evaluaton-grounding

Title Venue Date Code
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents arXiv 2024.1.17 njucckevin/SeeClick: The model, data and code for the visual GUI Agent SeeClick (github.com)
ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer Use likaixin2000/ScreenSpot-Pro-GUI-Grounding: GUI Grounding for Professional High-Resolution Computer Use (github.com)

Evaluation-navigation

Title Venue Date Code Note
Android in the Wild: A Large-Scale Dataset for Android Device Control NeurIPS 2024 2023.7.19 google-research/android_in_the_wild/README.md at master · google-research/google-research (github.com)
World of Bits: An Open-Domain Platform for Web-Based Agents PMLR 2017
Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration arXiv 2018.2.24 Farama-Foundation/miniwob-plusplus: MiniWoB++: a web interaction benchmark for reinforcement learning (github.com) MiniWoB++ is an extension of the OpenAI MiniWoB benchmark, and was introduced in the paper Reinforcement Learning on Web Interfaces using Workflow-Guided Exploration.
Mind2Web: Towards a Generalist Agent for the Web NeurIPS 2023 2023.6.9 [OSU-NLP-Group/Mind2Web: NeurIPS'23 Spotlight] "Mind2Web: Towards a Generalist Agent for the Web" (github.com) same author of model UGround
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis arXiv 2024.12.27 https://github.com/OS-Copilot/OS-Genesis
GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices arXiv 2024.6.13 https://github.com/OpenGVLab/GUI-Odyssey cross-app GUI navigation

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published