Show Lab

All

104 repositories

Paper2Video
Public
Automatic Video Generation from Scientific Papers
Python
•
MIT License
•136•1.1k•3•0•Updated Oct 17, 2025Oct 17, 2025
Code2Video
Public
Video generation via code
education coding multi-agent video-generation
Python
•
MIT License
•93•721•0•0•Updated Oct 16, 2025Oct 16, 2025
Awesome-Video-Diffusion
Public
A curated list of recent diffusion models for video generation, editing, and various other applications.
awesome video-editing video-generation diffusion-models motion-customization video-generation-evaluation
315•5.1k•0•0•Updated Oct 15, 2025Oct 15, 2025
livecc
Public
LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale (CVPR 2025)
Python
•36•286•9•1•Updated Oct 14, 2025Oct 14, 2025
Show-o
Public
[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.
multimodal diffusion-models large-language-models
Python
•
Apache License 2.0
•73•1.7k•54•3•Updated Oct 11, 2025Oct 11, 2025
Awesome-Unified-Multimodal-Models
Public
📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.
40•719•5•1•Updated Oct 10, 2025Oct 10, 2025
DIM
Public
The official implementation of the paper "Draw-In-Mind: Rebalancing Designer-Painter Roles in Unified Multimodal Models Benefits Image Editing"
Python
•
Other
•0•16•0•0•Updated Oct 8, 2025Oct 8, 2025
PANDA
Public
[NeurIPS 2025] PANDA: Towards Generalist Video Anomaly Detection via Agentic AI Engineer
Apache License 2.0
•1•11•2•0•Updated Oct 2, 2025Oct 2, 2025
PICO
Public
[ArXiv 2025] Personalized Vision via Visual In-Context Learning
MIT License
•1•3•0•0•Updated Sep 30, 2025Sep 30, 2025
SMS
Public
[ICCV 2025] Balanced Image Stylization with Style Matching Score
style-transfer diffusion score-distillation iccv2025
Python
•
MIT License
•2•62•0•0•Updated Sep 30, 2025Sep 30, 2025
TrustScorer
Public
ACM MM 2025 Can I Trust You? Advancing GUI Task Automation with Action Trust Score
MIT License
•0•6•0•0•Updated Sep 28, 2025Sep 28, 2025
Awesome-MLLM-Hallucination
Public
📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).
39•872•1•0•Updated Sep 27, 2025Sep 27, 2025
macosworld
Public
Python
•
Other
•1•15•0•0•Updated Sep 22, 2025Sep 22, 2025
Show-1
Public
[IJCV] Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
Python
•
Other
•56•1.1k•9•7•Updated Sep 13, 2025Sep 13, 2025
videollm-online
Public
VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)
Python
•
Apache License 2.0
•57•560•28•0•Updated Sep 2, 2025Sep 2, 2025
Awesome-GUI-Agent
Public
💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.
awesome graphical-user-interface ai-assistant llm-agent gui-agents
52•935•4•1•Updated Aug 17, 2025Aug 17, 2025
SAM-I2V
Public
[CVPR 2025] SAM-I2V
Jupyter Notebook
•
Apache License 2.0
•0•25•0•0•Updated Aug 8, 2025Aug 8, 2025
Multi-human-Talking-Video-Dataset
Public
Muti-human Interactive Talking Dataset
Python
•
Other
•1•50•1•0•Updated Aug 6, 2025Aug 6, 2025
WorldGUI
Public
Enable AI to control your PC. This repo includes the WorldGUI Benchmark and GUI-Thinker Agent Framework.
gui-application agents large-multimodal-models gui-agent
Python
•8•96•1•0•Updated Jul 27, 2025Jul 27, 2025
Impossible-Videos
Public
ICML 2025 - Impossible Videos
benchmark video video-understanding multimodal video-generation
Python
•6•77•1•0•Updated Jul 23, 2025Jul 23, 2025
IDProtector
Public
The code implementation of **IDProtector: An Adversarial Noise Encoder to Protect Against ID-Preserving Image Generation**.
Python
•0•16•5•0•Updated Jul 17, 2025Jul 17, 2025
DiffSim
Public
[ICCV 2025] Official repository of DiffSim: Taming Diffusion Models for Evaluating Visual Similarity
Python
•1•19•1•0•Updated Jul 14, 2025Jul 14, 2025
D-AR
Public
the official repo for "D-AR: Diffusion via Autoregressive Models"
diffusion-models autoregressive-models llms
Python
•
MIT License
•2•118•2•0•Updated Jun 21, 2025Jun 21, 2025
Q2A
Public
[ECCV 2022] AssistQ: Affordance-centric Question-driven Task Completion for Egocentric Assistant
Python
•5•21•1•0•Updated Jun 18, 2025Jun 18, 2025
Awesome-Robotics-Diffusion
Public
A curated list of recent robot learning papers incorporating diffusion models for robotics tasks.
8•261•0•0•Updated Jun 13, 2025Jun 13, 2025
VideoGUI
Public
[NeurIPS 2024 D&B] VideoGUI: A Benchmark for GUI Automation from Instructional Videos
gui video-language llm-agent
JavaScript
•3•45•2•0•Updated Jun 13, 2025Jun 13, 2025
OmniConsistency
Public
The official code implementation of the paper "OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data."
Python
•25•405•9•1•Updated Jun 8, 2025Jun 8, 2025
omg
Public
Open Multimodal Gathering workshop @ NUS
JavaScript
•0•0•0•0•Updated Jun 5, 2025Jun 5, 2025
UniRL
Public
The code repository of UniRL
Python
•
Apache License 2.0
•3•42•1•0•Updated May 30, 2025May 30, 2025
ShowUI
Public
[CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.
agent vision-language-model vision-language-action computer-use gui-agent
Python
•
Apache License 2.0
•105•1.5k•11•0•Updated May 29, 2025May 29, 2025