Skip to content
Change the repository type filter

All

    Repositories list

    • Automatic Video Generation from Scientific Papers
      Python
      1361.1k30Updated Oct 17, 2025Oct 17, 2025
    • Video generation via code
      Python
      9372100Updated Oct 16, 2025Oct 16, 2025
    • A curated list of recent diffusion models for video generation, editing, and various other applications.
      3155.1k00Updated Oct 15, 2025Oct 15, 2025
    • livecc

      Public
      LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale (CVPR 2025)
      Python
      3628691Updated Oct 14, 2025Oct 14, 2025
    • Show-o

      Public
      [ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.
      Python
      731.7k543Updated Oct 11, 2025Oct 11, 2025
    • 📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.
      4071951Updated Oct 10, 2025Oct 10, 2025
    • DIM

      Public
      The official implementation of the paper "Draw-In-Mind: Rebalancing Designer-Painter Roles in Unified Multimodal Models Benefits Image Editing"
      Python
      01600Updated Oct 8, 2025Oct 8, 2025
    • PANDA

      Public
      [NeurIPS 2025] PANDA: Towards Generalist Video Anomaly Detection via Agentic AI Engineer
      11120Updated Oct 2, 2025Oct 2, 2025
    • PICO

      Public
      [ArXiv 2025] Personalized Vision via Visual In-Context Learning
      1300Updated Sep 30, 2025Sep 30, 2025
    • SMS

      Public
      [ICCV 2025] Balanced Image Stylization with Style Matching Score
      Python
      26200Updated Sep 30, 2025Sep 30, 2025
    • ACM MM 2025 Can I Trust You? Advancing GUI Task Automation with Action Trust Score
      0600Updated Sep 28, 2025Sep 28, 2025
    • 📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).
      3987210Updated Sep 27, 2025Sep 27, 2025
    • Python
      11500Updated Sep 22, 2025Sep 22, 2025
    • Show-1

      Public
      [IJCV] Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
      Python
      561.1k97Updated Sep 13, 2025Sep 13, 2025
    • VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)
      Python
      57560280Updated Sep 2, 2025Sep 2, 2025
    • 💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.
      5293541Updated Aug 17, 2025Aug 17, 2025
    • SAM-I2V

      Public
      [CVPR 2025] SAM-I2V
      Jupyter Notebook
      02500Updated Aug 8, 2025Aug 8, 2025
    • Muti-human Interactive Talking Dataset
      Python
      15010Updated Aug 6, 2025Aug 6, 2025
    • WorldGUI

      Public
      Enable AI to control your PC. This repo includes the WorldGUI Benchmark and GUI-Thinker Agent Framework.
      Python
      89610Updated Jul 27, 2025Jul 27, 2025
    • ICML 2025 - Impossible Videos
      Python
      67710Updated Jul 23, 2025Jul 23, 2025
    • The code implementation of **IDProtector: An Adversarial Noise Encoder to Protect Against ID-Preserving Image Generation**.
      Python
      01650Updated Jul 17, 2025Jul 17, 2025
    • DiffSim

      Public
      [ICCV 2025] Official repository of DiffSim: Taming Diffusion Models for Evaluating Visual Similarity
      Python
      11910Updated Jul 14, 2025Jul 14, 2025
    • D-AR

      Public
      the official repo for "D-AR: Diffusion via Autoregressive Models"
      Python
      211820Updated Jun 21, 2025Jun 21, 2025
    • Q2A

      Public
      [ECCV 2022] AssistQ: Affordance-centric Question-driven Task Completion for Egocentric Assistant
      Python
      52110Updated Jun 18, 2025Jun 18, 2025
    • A curated list of recent robot learning papers incorporating diffusion models for robotics tasks.
      826100Updated Jun 13, 2025Jun 13, 2025
    • VideoGUI

      Public
      [NeurIPS 2024 D&B] VideoGUI: A Benchmark for GUI Automation from Instructional Videos
      JavaScript
      34520Updated Jun 13, 2025Jun 13, 2025
    • The official code implementation of the paper "OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data."
      Python
      2540591Updated Jun 8, 2025Jun 8, 2025
    • omg

      Public
      Open Multimodal Gathering workshop @ NUS
      JavaScript
      0000Updated Jun 5, 2025Jun 5, 2025
    • UniRL

      Public
      The code repository of UniRL
      Python
      34210Updated May 30, 2025May 30, 2025
    • ShowUI

      Public
      [CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.
      Python
      1051.5k110Updated May 29, 2025May 29, 2025