AwesomeUIAgent

model with grounding abilitty

Title	Venue	Date	Code
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents	arXiv	2024.1.17	njucckevin/SeeClick: The model, data and code for the visual GUI Agent SeeClick (github.com)
ShowUI: One Vision-Language-Action Model for GUI Visual Agent	NeurIPS2024 Open-World Agents workshop	2024.11.26	showlab/ShowUI: Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use. (github.com)
Aria-UI: Visual Grounding for GUI Instructions	arXiv	2024.12.20	AriaUI/Aria-UI: Aria-UI: Visual Grounding for GUI Instructions (github.com)
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents	arXiv	2024.10.7	OSU-NLP-Group/UGround: UGround: Universal GUI Visual Grounding for GUI Agents (github.com)
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction	arXiv	2024.12.5	xlang-ai/aguvis: Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction (github.com)
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents	arXiv	2024.10.30	OS-Copilot/OS-Atlas: OS-ATLAS: A Foundation Action Model For Generalist GUI Agents (github.com)
CogAgent: A Visual Language Model for GUI Agents	CVPR 2024 conference Highlight (top 3%).	2023.12.14；2024.12.27(v3)	THUDM/CogAgent: An open-sourced end-to-end VLM-based GUI Agent (github.com)
InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection	arXiv	2025.1.8	Reallm-Labs/InfiGUIAgent (github.com)
UI-TARS: Pioneering Automated GUI Interaction with Native Agents	arXiv	2025.1.21	https://github.com/bytedance/UI-TARS
GUI-Bee: Align GUI Action Grounding to Novel Environments via Autonomous Exploration	arXiv	2025.1.23	https://github.com/GUI-Bee/gui-bee.github.io
UI-TARS: Pioneering Automated GUI Interaction with Native Agents	arXiv	2025.1.21	https://github.com/bytedance/UI-TARS
GUI-Bee : Align GUI Action Grounding to Novel Environments via Autonomous Exploration	arXiv	2025.1.23

agent

Title	Venue	Date	Code	Note
Towards general computer control: A multimodal agent for red dead redemption ii as a case study	ICLR 2024 Workshop on Large Language Model (LLM) Agents	2024.3.5	https://github.com/BAAI-Agents/Cradle
PPTAgent: Generating and Evaluating Presentations Beyond Text-to-Slides	arXiv	2025.1.7	https://github.com/icip-cas/PPTAgent	None GUI Grounding

Agent framework

Title	Venue	Date	Code
Eko (Eko Keeps Operating) - Build Production-ready Agentic Workflow with Natural Language			https://github.com/FellouAI/eko

Evaluation-agent

Title	Venue	Date	Code
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments	NeurIPS 2024	2024.4.11	[xlang-ai/OSWorld: NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments (github.com)
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale	arXiv	2024.9.12	microsoft/WindowsAgentArena: Windows Agent Arena (WAA) 🪟 is a scalable OS platform for testing and benchmarking of multi-modal AI agents. (github.com)

Evaluaton-grounding

Title	Venue	Date	Code
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents	arXiv	2024.1.17	njucckevin/SeeClick: The model, data and code for the visual GUI Agent SeeClick (github.com)
ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer Use			likaixin2000/ScreenSpot-Pro-GUI-Grounding: GUI Grounding for Professional High-Resolution Computer Use (github.com)

Evaluation-navigation

Title	Venue	Date	Code	Note
Android in the Wild: A Large-Scale Dataset for Android Device Control	NeurIPS 2024	2023.7.19	google-research/android_in_the_wild/README.md at master · google-research/google-research (github.com)
World of Bits: An Open-Domain Platform for Web-Based Agents	PMLR 2017
Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration	arXiv	2018.2.24	Farama-Foundation/miniwob-plusplus: MiniWoB++: a web interaction benchmark for reinforcement learning (github.com)	MiniWoB++ is an extension of the OpenAI MiniWoB benchmark, and was introduced in the paper Reinforcement Learning on Web Interfaces using Workflow-Guided Exploration.
Mind2Web: Towards a Generalist Agent for the Web	NeurIPS 2023	2023.6.9	[OSU-NLP-Group/Mind2Web: NeurIPS'23 Spotlight] "Mind2Web: Towards a Generalist Agent for the Web" (github.com)	same author of model UGround
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis	arXiv	2024.12.27	https://github.com/OS-Copilot/OS-Genesis
GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices	arXiv	2024.6.13	https://github.com/OpenGVLab/GUI-Odyssey	cross-app GUI navigation

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AwesomeUIAgent

model with grounding abilitty

agent

Agent framework

Evaluation-agent

Evaluaton-grounding

Evaluation-navigation

About

Releases

Packages

MAC-AutoML/AwesomeUIAgent

Folders and files

Latest commit

History

Repository files navigation

AwesomeUIAgent

model with grounding abilitty

agent

Agent framework

Evaluation-agent

Evaluaton-grounding

Evaluation-navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages